Quick explanation of Physical and Logical Structure of PDF Files

Viewing PDF metadata:

$ exiftool ./project_plan.pdf
ExifTool Version Number         : 13.30
File Name                       : project_plan.pdf
Directory                       : ..
File Size                       : 11 kB
File Modification Date/Time     : 2025:07:14 14:05:37+03:00
File Access Date/Time           : 2025:07:14 14:09:20+03:00
File Inode Change Date/Time     : 2025:07:14 14:09:18+03:00
File Permissions                : -rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
Page Count                      : 2
Page Layout                     : OneColumn
Producer                        : PyFPDF 1.7.2 http://pyfpdf.googlecode.com/
Create Date                     : 2024:09:18 13:57:04

Parse PDF file using pdf-parser to view the code.

$ pdf-parser -h
 
Usage: pdf-parser [options] pdf-file|zip-file|url
pdf-parser, use it to parse a PDF document
 
Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -m, --man             Print manual
  -s SEARCH, --search=SEARCH
                        string to search in indirect objects (except streams)
  -f, --filter          pass stream object through filters (FlateDecode,
                        ASCIIHexDecode, ASCII85Decode, LZWDecode and
                        RunLengthDecode only)
  -o OBJECT, --object=OBJECT
                        id(s) of indirect object(s) to select, use comma (,)
                        to separate ids (version independent)
  -r REFERENCE, --reference=REFERENCE
                        id of indirect object being referenced (version
                        independent)
  -e ELEMENTS, --elements=ELEMENTS
                        type of elements to select (cxtsi)
  -w, --raw             raw output for data and filters
  -a, --stats           display stats for pdf document
  -t TYPE, --type=TYPE  type of indirect object to select
  -O, --objstm          parse stream of /ObjStm objects
  -v, --verbose         display malformed PDF elements
  -x EXTRACT, --extract=EXTRACT
                        filename to extract malformed content to
  -H, --hash            display hash of objects
  -n, --nocanonicalizedoutput
                        do not canonicalize the output
  -d DUMP, --dump=DUMP  filename to dump stream content to
  -D, --debug           display debug info
  -c, --content         display the content for objects without streams or
                        with streams without filters
  --searchstream=SEARCHSTREAM
                        string to search in streams
  --unfiltered          search in unfiltered streams
  --casesensitive       case sensitive search in streams
  --regex               use regex to search in streams
  --overridingfilters=OVERRIDINGFILTERS
                        override filters with given filters (use raw for the
                        raw stream content)
  -g, --generate        generate a Python program that creates the parsed PDF
                        file
  --generateembedded=GENERATEEMBEDDED
                        generate a Python program that embeds the selected
                        indirect object as a file
  -y YARA, --yara=YARA  YARA rule (or directory or @file) to check streams
                        (can be used with option --unfiltered)
  --yarastrings         Print YARA strings
  --decoders=DECODERS   decoders to load (separate decoders with a comma , ;
                        @file supported)
  --decoderoptions=DECODEROPTIONS
                        options for the decoder
  -k KEY, --key=KEY     key to search in dictionaries
  -j, --jsonoutput      produce json output
 
  pdf-parser, use it to parse a PDF document
  Source code put in the public domain by Didier Stevens, no Copyright
  Use at your own risk
  https://DidierStevens.com

Running the program with --raw and -f to maximize the detailed of the output.

$ pdf-parser --raw -f project_plan.pdf
 
PDF Comment %PDF-1.3
 
obj 3 0
 Type: /Page
 Referencing: 1 0 R, 2 0 R, 4 0 R
<</Type /Page /Parent 1 0 R /Resources 2 0 R /Contents 4 0 R>>
 
obj 4 0
 Type: 
 Referencing: 
 Contains stream
<</Filter /FlateDecode /Length 870>>
[stream data of length 870]
 
obj 5 0
 Type: /Page
 Referencing: 1 0 R, 2 0 R, 6 0 R
<</Type /Page /Parent 1 0 R /Resources 2 0 R /Contents 6 0 R>>
 
obj 6 0
 Type: 
 Referencing: 
 Contains stream
<</Filter /FlateDecode /Length 933>>
[stream data of length 933]
 
obj 1 0
 Type: /Pages
 Referencing: 3 0 R, 5 0 R
<</Type /Pages /Kids [3 0 R 5 0 R] /Count 2 /MediaBox [0 0 595.28 841.89]>>
 
obj 7 0
 Type: /Font
 Referencing: 
<</Type /Font /BaseFont /Helvetica-Bold /Subtype /Type1 /Encoding /WinAnsiEncoding>>
 
obj 8 0
 Type: /Font
 Referencing: 
<</Type /Font /BaseFont /Helvetica /Subtype /Type1 /Encoding /WinAnsiEncoding>>
 
obj 2 0
 Type: 
 Referencing: 7 0 R, 8 0 R
<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI] /Font <</F1 7 0 R /F2 8 0 R>> /XObject <<>> >>
 
obj 9 0
 Type: 
 Referencing: 
<</Producer (PyFPDF 1.7.2) /CreationDate (D:20240918135704)>>
 
obj 10 0
 Type: /Catalog
 Referencing: 1 0 R, 3 0 R
<</Type /Catalog /Pages 1 0 R /OpenAction [3 0 R /FitH null] /PageLayout /OneColumn>>
 
xref
trailer
<</Size 11 /Root 10 0 R /Info 9 0 R>>
startxref 2725
PDF Comment EOF

I have shorten some part for easier observation. The key part for this malicious PDF is object 14, where EXE is hidden via /EmbeddedFiles and compressed with FlateDecode—common in maldocs.

# Select object 14 and dump it
$ pdf-parser --raw -f project_plan.pdf -o 14 -d obj14
 
$ file obj14
obj14: PE32+ executable for MS Windows 6.00 (console), x86-64, 6 sections

Post-exploitation artifacts:

# Hunt for registry persistence
reg query HKCU\Software\Microsoft\Windows\CurrentVersion\Run