PDF forensics

Quick explanation of Physical and Logical Structure of PDF Files

Viewing PDF metadata:

$ exiftool ./project_plan.pdf
ExifTool Version Number         : 13.30
File Name                       : project_plan.pdf
Directory                       : ..
File Size                       : 11 kB
File Modification Date/Time     : 2025:07:14 14:05:37+03:00
File Access Date/Time           : 2025:07:14 14:09:20+03:00
File Inode Change Date/Time     : 2025:07:14 14:09:18+03:00
File Permissions                : -rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
Page Count                      : 2
Page Layout                     : OneColumn
Producer                        : PyFPDF 1.7.2 http://pyfpdf.googlecode.com/
Create Date                     : 2024:09:18 13:57:04

Parse PDF file using pdf-parser to view the code.

$ pdf-parser -h
 
Usage: pdf-parser [options] pdf-file|zip-file|url
pdf-parser, use it to parse a PDF document
 
Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -m, --man             Print manual
  -s SEARCH, --search=SEARCH
                        string to search in indirect objects (except streams)
  -f, --filter          pass stream object through filters (FlateDecode,
                        ASCIIHexDecode, ASCII85Decode, LZWDecode and
                        RunLengthDecode only)
  -o OBJECT, --object=OBJECT
                        id(s) of indirect object(s) to select, use comma (,)
                        to separate ids (version independent)
  -r REFERENCE, --reference=REFERENCE
                        id of indirect object being referenced (version
                        independent)
  -e ELEMENTS, --elements=ELEMENTS
                        type of elements to select (cxtsi)
  -w, --raw             raw output for data and filters
  -a, --stats           display stats for pdf document
  -t TYPE, --type=TYPE  type of indirect object to select
  -O, --objstm          parse stream of /ObjStm objects
  -v, --verbose         display malformed PDF elements
  -x EXTRACT, --extract=EXTRACT
                        filename to extract malformed content to
  -H, --hash            display hash of objects
  -n, --nocanonicalizedoutput
                        do not canonicalize the output
  -d DUMP, --dump=DUMP  filename to dump stream content to
  -D, --debug           display debug info
  -c, --content         display the content for objects without streams or
                        with streams without filters
  --searchstream=SEARCHSTREAM
                        string to search in streams
  --unfiltered          search in unfiltered streams
  --casesensitive       case sensitive search in streams
  --regex               use regex to search in streams
  --overridingfilters=OVERRIDINGFILTERS
                        override filters with given filters (use raw for the
                        raw stream content)
  -g, --generate        generate a Python program that creates the parsed PDF
                        file
  --generateembedded=GENERATEEMBEDDED
                        generate a Python program that embeds the selected
                        indirect object as a file
  -y YARA, --yara=YARA  YARA rule (or directory or @file) to check streams
                        (can be used with option --unfiltered)
  --yarastrings         Print YARA strings
  --decoders=DECODERS   decoders to load (separate decoders with a comma , ;
                        @file supported)
  --decoderoptions=DECODEROPTIONS
                        options for the decoder
  -k KEY, --key=KEY     key to search in dictionaries
  -j, --jsonoutput      produce json output
 
  pdf-parser, use it to parse a PDF document
  Source code put in the public domain by Didier Stevens, no Copyright
  Use at your own risk
  https://DidierStevens.com

Running the program with --raw and -f to maximize the detailed of the output.

$ pdf-parser --raw -f project_plan.pdf
 
PDF Comment %PDF-1.3
 
obj 3 0
 Type: /Page
 Referencing: 1 0 R, 2 0 R, 4 0 R
<</Type /Page /Parent 1 0 R /Resources 2 0 R /Contents 4 0 R>>
 
obj 4 0
 Type: 
 Referencing: 
 Contains stream
<</Filter /FlateDecode /Length 870>>
[stream data of length 870]
 
obj 5 0
 Type: /Page
 Referencing: 1 0 R, 2 0 R, 6 0 R
<</Type /Page /Parent 1 0 R /Resources 2 0 R /Contents 6 0 R>>
 
obj 6 0
 Type: 
 Referencing: 
 Contains stream
<</Filter /FlateDecode /Length 933>>
[stream data of length 933]
 
obj 1 0
 Type: /Pages
 Referencing: 3 0 R, 5 0 R
<</Type /Pages /Kids [3 0 R 5 0 R] /Count 2 /MediaBox [0 0 595.28 841.89]>>
 
obj 7 0
 Type: /Font
 Referencing: 
<</Type /Font /BaseFont /Helvetica-Bold /Subtype /Type1 /Encoding /WinAnsiEncoding>>
 
obj 8 0
 Type: /Font
 Referencing: 
<</Type /Font /BaseFont /Helvetica /Subtype /Type1 /Encoding /WinAnsiEncoding>>
 
obj 2 0
 Type: 
 Referencing: 7 0 R, 8 0 R
<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI] /Font <</F1 7 0 R /F2 8 0 R>> /XObject <<>> >>
 
obj 9 0
 Type: 
 Referencing: 
<</Producer (PyFPDF 1.7.2) /CreationDate (D:20240918135704)>>
 
obj 10 0
 Type: /Catalog
 Referencing: 1 0 R, 3 0 R
<</Type /Catalog /Pages 1 0 R /OpenAction [3 0 R /FitH null] /PageLayout /OneColumn>>
 
xref
trailer
<</Size 11 /Root 10 0 R /Info 9 0 R>>
startxref 2725
PDF Comment EOF

I have shorten some part for easier observation. The key part for this malicious PDF is object 14, where EXE is hidden via /EmbeddedFiles and compressed with FlateDecode—common in maldocs.

# Select object 14 and dump it
$ pdf-parser --raw -f project_plan.pdf -o 14 -d obj14
 
$ file obj14
obj14: PE32+ executable for MS Windows 6.00 (console), x86-64, 6 sections

Post-exploitation artifacts:

# Hunt for registry persistence
reg query HKCU\Software\Microsoft\Windows\CurrentVersion\Run

Recent Notes

Typosquatting

MFT forensics

PDF forensics

Project Hail Mary

Tomorrow, and Tomorrow, and Tomorrow

PDF forensics

Graph View

Recent Notes

Typosquatting

MFT forensics