Automation Guide

Batch Unlock Multiple PDFs — Automation Guide

Unlocking a single PDF is straightforward when you know the password. When you need to process 50, 500, or 5,000 encrypted PDFs — perhaps from an old corporate document management system, a pre-2010 archive, or a deceased employee's file server — automation becomes essential. This guide covers batch approaches for three scenarios: (a) all files share the same password, (b) each file has its own password but you know them, and (c) the passwords are all unknown (batch cracking). We cover QPDF, hashcat, Python scripting, and shell pipelines for each scenario.

Scenario 1 — Same password for all files

Many organisations use a single document-level password for all internal PDFs. If you know that password, batch unlocking is trivial. Use QPDF's --decrypt flag in a shell loop: for f in *.pdf; do qpdf --decrypt --password=THEPASSWORD "$f" "unlocked_$f"; done

On Windows: Get-ChildItem *.pdf | ForEach-Object { qpdf --decrypt --password=THEPASSWORD $_.Name "unlocked_$($_.Name)" }. QPDF is available on all platforms via conda, apt, or the QPDF website.

Performance: QPDF processes approximately 50-200 encrypted PDFs per second per CPU core (depending on file size and encryption tier). A 5,000-file batch completes in 25-100 seconds on a modern multi-core system. The bottleneck is disk I/O for large files, not decryption throughput.

Verify each output: QPDF exits with code 0 on success and non-zero on failure. Check exit codes in the shell loop and log failures for manual review. Files that fail with the shared password may have a different password or be corrupted.

Parallel batch processing

Use GNU parallel or xargs -P to process multiple files concurrently. qpdf is CPU-bound, so match thread count to CPU cores: ls *.pdf | parallel -j$(nproc) qpdf --decrypt --password=THEPASSWORD {} unlocked_{}

Scenario 2 — Known passwords per file (password map)

If you have a password map (spreadsheet or CSV mapping filename to password), automate with a Python script. Read the CSV, iterate over files, and apply qpdf with the per-file password. Python's subprocess module runs qpdf and captures exit codes.

Sample approach: read a 'passwords.csv' with columns 'filename,password'. For each row, call qpdf --decrypt --password=ROW_PASSWORD. Log success/fail per file to an output CSV. Files that fail may indicate the password is wrong in the map — flag them for manual review.

This scenario is common in enterprise content migration projects where the document management system stores per-document passwords in a database. Export the password table as CSV, match by filename, and run the batch decrypt. A 10,000-file batch processes in 5-20 minutes on a modern server.

Scenario 3 — All passwords unknown (batch cracking)

When none of the passwords are known, you need batch password cracking. This is the most technically demanding scenario. The approach: extract hashes from all PDFs, classify by encryption mode, and run hashcat on the batch.

Step 1: Use pdf2john.pl (John the Ripper's PDF hash extractor) or pdf2hashcat to extract hash lines from all PDFs. Batch: for f in *.pdf; do pdf2john.pl "$f"; done > pdf_hashes.txt. The output is one hash line per file with the encryption mode prefix.

Step 2: Sort hashes by mode. Mode 10400 (40-bit RC4) should be separated from mode 10700 (AES-256) because different cracking strategies apply. Mode 10400 is guaranteed recoverable — just compute the keyspace. Mode 10700 requires dictionary+rule attacks.

Step 3: Run hashcat in dictionary mode against the batch: hashcat -m 10700 pdf_hashes.txt rockyou.txt -r best64.rule. Hashcat processes all hashes simultaneously — one cracked hash per file means that file's password was found. Files not cracked after the dictionary phase proceed to mask attack.

Hashcat batch optimizations

When cracking multiple PDF hashes simultaneously, hashcat processes them all against each candidate password. This is more efficient than cracking one file at a time because the GPU overhead is amortized across all hashes. A batch of 1,000 mode 10700 hashes against a 10-million-word dictionary processes in roughly the same time as a single hash.

Use hashcat's --outfile-autohex-disable flag for clean plaintext output, and --show after the run to list all cracked passwords. The hashcat potfile automatically tracks per-hash status — re-running the same batch skips already-cracked hashes.

For mode 10400 (40-bit RC4) batch cracking, use hashcat's brute-force on the 40-bit keyspace. Mode 10400 has no per-password cost variation — all hashes of this mode have the same work factor. A batch of 1,000 mode 10400 hashes against the 2^40 keyspace completes in hours on a single RTX 5090.

Automation infrastructure recommendations

Local GPU cluster: 4-8 high-end GPUs (RTX 5090 or equivalent). Expected batch throughput: 10,000-50,000 mode 10700 hashes tested against a 10M-word dictionary with 50 rules = 2-7 days for the dictionary phase, depending on GPU count and KDF round count.

Cloud GPU instances: AWS EC2 G6 instance (L40S GPU), Azure NC A100 v4, or Vast.ai rentals. Cloud GPUs are cost-effective for one-time batch cracking jobs — you stop the instance when done and pay only for compute time.

Job management: use hashcat's --potfile-disable for fully isolated per-run tracking, or --potfile-path to specify a custom potfile path per batch. For large batches, split the hash list into sub-batches (500-1,000 hashes each) and run on separate GPU instances to scale horizontally.

Handling mixed encryption tiers in a batch

Not all PDFs in a batch use the same encryption. A corporate archive spanning 2000-2026 likely contains: mode 10400 (PDF 1.1-1.3, pre-2001), mode 10500 (PDF 1.4-1.6 RC4), mode 10600 (PDF 1.6+ AES-128), and mode 10700 (PDF 1.7+ AES-256).

Extract all hashes with pdf2john.pl, which tags each line with the filename and mode. Then filter by mode prefix (e.g., grep '$pdf$4

#x27; for mode 10400, '$pdf$5
#x27; for mode 10700) and crack each mode group separately with appropriate strategies.

Mode 10400: brute-force key search (guaranteed). Mode 10500-10600: dictionary+rule attack (human-chosen passwords are likely). Mode 10700: dictionary+rule first, then mask attack with character-class constraints if no success.

Batch output verification and quality control

After batch cracking: run qpdf --decrypt with the found password to verify the output opens correctly. Use a script to check exit codes and file sizes. Unsuccessfully decrypted files (wrong password, no password found) should be quarantined for separate handling.

For files where hashcat found a password but qpdf rejects it: verify the hash extraction was correct (re-run pdf2john.pl and hashcat --show). Some PDFs have multiple encryption layers or corruption that prevents proper decryption even with the correct password.

Document the batch process thoroughly — especially which mode groups failed and why. For a corporate migration: the success rate is typically 60-90% for mode 10700 batch cracking (human-chosen passwords), 85-95% for mode 10600, 99.9+% for mode 10400 (guaranteed). Files with strong random passwords or password mismanagement account for the remainder.

Batch PDF unlock flow

  1. 1

    Inventory the batch

    Count files, check encryption tiers with qpdf --show-encryption, estimate total processing time.

  2. 2

    Extract all hashes

    pdf2john.pl *.pdf > pdf_hashes.txt. One hash line per file, tagged by mode.

  3. 3

    Sort by encryption mode

    Separate mode 10400 (guaranteed via key search) from 10500-10700 (password search).

  4. 4

    Crack each mode group

    Mode 10400: brute-force 40-bit key. Modes 10500-10700: dictionary+rule attack on multi-GPU cluster.

  5. 5

    Verify and decrypt

    Use hashcat --show to extract found passwords, then qpdf --decrypt each file with its password. Log failures for manual review.

Frequently Asked Questions

What is the fastest tool for batch PDF decryption?
QPDF with GNU parallel for known-password batches. Hashcat for unknown-password batches. Both are free and open-source.
Can I batch-crack PDFs on a CPU instead of GPU?
Yes, with John the Ripper (JtR). CPU cracking is ~100x slower than GPU for modes 10500-10700. Mode 10400 is fast enough on CPU for small batches.
What about PDFs with different passwords per file?
Use hashcat --potfile to track per-hash status. Hashcat tests each candidate password against all hashes simultaneously, handling per-file passwords efficiently.
How do I handle PDF 2.0 files in a batch?
PDF 2.0 encryption uses the same mode 10700 as PDF 1.7 ext 8. Pdf2john.pl and hashcat 6.3+ handle them. Check for increased KDF round count which may slow cracking.
What if some PDFs in the batch are corrupted?
QPDF reports errors for corrupted files. Separate corrupted files from the batch and repair first (see PDF repair guides). Cracking a corrupted file's password is useless if the decrypt dictionary is unreadable.
Are there commercial batch PDF unlock tools?
Yes: Passware Kit Forensic, Elcomsoft PDF Batch, and Recovery Toolbox for PDF offer batch processing with GUI. They cost $200-$2,000 per license but automate the full pipeline.

Have a forgotten-password PDF to recover?

Run a free analysis — encryption type detected automatically, fast techniques tried first, pay only on success.

Run Free Analysis

Related Reading