Quick explanation of Physical and Logical Structure of PDF Files
Viewing PDF metadata:
$ exiftool ./project_plan.pdf
ExifTool Version Number : 13.30
File Name : project_plan.pdf
Directory : ..
File Size : 11 kB
File Modification Date/Time : 2025:07:14 14:05:37+03:00
File Access Date/Time : 2025:07:14 14:09:20+03:00
File Inode Change Date/Time : 2025:07:14 14:09:18+03:00
File Permissions : -rw-r--r--
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.3
Linearized : No
Page Count : 2
Page Layout : OneColumn
Producer : PyFPDF 1.7.2 http://pyfpdf.googlecode.com/
Create Date : 2024:09:18 13:57:04
Parse PDF file using pdf-parser to view the code.
$ pdf-parser -h
Usage: pdf-parser [options] pdf-file|zip-file|url
pdf-parser, use it to parse a PDF document
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-m, --man Print manual
-s SEARCH, --search=SEARCH
string to search in indirect objects (except streams)
-f, --filter pass stream object through filters (FlateDecode,
ASCIIHexDecode, ASCII85Decode, LZWDecode and
RunLengthDecode only)
-o OBJECT, --object=OBJECT
id(s) of indirect object(s) to select, use comma (,)
to separate ids (version independent)
-r REFERENCE, --reference=REFERENCE
id of indirect object being referenced (version
independent)
-e ELEMENTS, --elements=ELEMENTS
type of elements to select (cxtsi)
-w, --raw raw output for data and filters
-a, --stats display stats for pdf document
-t TYPE, --type=TYPE type of indirect object to select
-O, --objstm parse stream of /ObjStm objects
-v, --verbose display malformed PDF elements
-x EXTRACT, --extract=EXTRACT
filename to extract malformed content to
-H, --hash display hash of objects
-n, --nocanonicalizedoutput
do not canonicalize the output
-d DUMP, --dump=DUMP filename to dump stream content to
-D, --debug display debug info
-c, --content display the content for objects without streams or
with streams without filters
--searchstream=SEARCHSTREAM
string to search in streams
--unfiltered search in unfiltered streams
--casesensitive case sensitive search in streams
--regex use regex to search in streams
--overridingfilters=OVERRIDINGFILTERS
override filters with given filters (use raw for the
raw stream content)
-g, --generate generate a Python program that creates the parsed PDF
file
--generateembedded=GENERATEEMBEDDED
generate a Python program that embeds the selected
indirect object as a file
-y YARA, --yara=YARA YARA rule (or directory or @file) to check streams
(can be used with option --unfiltered)
--yarastrings Print YARA strings
--decoders=DECODERS decoders to load (separate decoders with a comma , ;
@file supported)
--decoderoptions=DECODEROPTIONS
options for the decoder
-k KEY, --key=KEY key to search in dictionaries
-j, --jsonoutput produce json output
pdf-parser, use it to parse a PDF document
Source code put in the public domain by Didier Stevens, no Copyright
Use at your own risk
https://DidierStevens.com
Running the program with --raw
and -f
to maximize the detailed of the output.
$ pdf-parser --raw -f project_plan.pdf
PDF Comment %PDF-1.3
obj 3 0
Type: /Page
Referencing: 1 0 R, 2 0 R, 4 0 R
<</Type /Page /Parent 1 0 R /Resources 2 0 R /Contents 4 0 R>>
obj 4 0
Type:
Referencing:
Contains stream
<</Filter /FlateDecode /Length 870>>
[stream data of length 870]
obj 5 0
Type: /Page
Referencing: 1 0 R, 2 0 R, 6 0 R
<</Type /Page /Parent 1 0 R /Resources 2 0 R /Contents 6 0 R>>
obj 6 0
Type:
Referencing:
Contains stream
<</Filter /FlateDecode /Length 933>>
[stream data of length 933]
obj 1 0
Type: /Pages
Referencing: 3 0 R, 5 0 R
<</Type /Pages /Kids [3 0 R 5 0 R] /Count 2 /MediaBox [0 0 595.28 841.89]>>
obj 7 0
Type: /Font
Referencing:
<</Type /Font /BaseFont /Helvetica-Bold /Subtype /Type1 /Encoding /WinAnsiEncoding>>
obj 8 0
Type: /Font
Referencing:
<</Type /Font /BaseFont /Helvetica /Subtype /Type1 /Encoding /WinAnsiEncoding>>
obj 2 0
Type:
Referencing: 7 0 R, 8 0 R
<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI] /Font <</F1 7 0 R /F2 8 0 R>> /XObject <<>> >>
obj 9 0
Type:
Referencing:
<</Producer (PyFPDF 1.7.2) /CreationDate (D:20240918135704)>>
obj 10 0
Type: /Catalog
Referencing: 1 0 R, 3 0 R
<</Type /Catalog /Pages 1 0 R /OpenAction [3 0 R /FitH null] /PageLayout /OneColumn>>
xref
trailer
<</Size 11 /Root 10 0 R /Info 9 0 R>>
startxref 2725
PDF Comment EOF
I have shorten some part for easier observation. The key part for this malicious PDF is object 14, where EXE
is hidden via /EmbeddedFiles
and compressed with FlateDecode
—common in maldocs.
# Select object 14 and dump it
$ pdf-parser --raw -f project_plan.pdf -o 14 -d obj14
$ file obj14
obj14: PE32+ executable for MS Windows 6.00 (console), x86-64, 6 sections
Post-exploitation artifacts:
# Hunt for registry persistence
reg query HKCU\Software\Microsoft\Windows\CurrentVersion\Run