Batch Unlock Multiple PDFs — Automation Guide
Unlocking a single PDF is straightforward when you know the password. When you need to process 50, 500, or 5,000 encrypted PDFs — perhaps from an old corporate document management system, a pre-2010 archive, or a deceased employee's file server — automation becomes essential. This guide covers batch approaches for three scenarios: (a) all files share the same password, (b) each file has its own password but you know them, and (c) the passwords are all unknown (batch cracking). We cover QPDF, hashcat, Python scripting, and shell pipelines for each scenario.
Scenario 1 — Same password for all files
Many organisations use a single document-level password for all internal PDFs. If you know that password, batch unlocking is trivial. Use QPDF's --decrypt flag in a shell loop: for f in *.pdf; do qpdf --decrypt --password=THEPASSWORD "$f" "unlocked_$f"; done
On Windows: Get-ChildItem *.pdf | ForEach-Object { qpdf --decrypt --password=THEPASSWORD $_.Name "unlocked_$($_.Name)" }. QPDF is available on all platforms via conda, apt, or the QPDF website.
Performance: QPDF processes approximately 50-200 encrypted PDFs per second per CPU core (depending on file size and encryption tier). A 5,000-file batch completes in 25-100 seconds on a modern multi-core system. The bottleneck is disk I/O for large files, not decryption throughput.
Verify each output: QPDF exits with code 0 on success and non-zero on failure. Check exit codes in the shell loop and log failures for manual review. Files that fail with the shared password may have a different password or be corrupted.
Parallel batch processing
Use GNU parallel or xargs -P to process multiple files concurrently. qpdf is CPU-bound, so match thread count to CPU cores: ls *.pdf | parallel -j$(nproc) qpdf --decrypt --password=THEPASSWORD {} unlocked_{}
Scenario 2 — Known passwords per file (password map)
If you have a password map (spreadsheet or CSV mapping filename to password), automate with a Python script. Read the CSV, iterate over files, and apply qpdf with the per-file password. Python's subprocess module runs qpdf and captures exit codes.
Sample approach: read a 'passwords.csv' with columns 'filename,password'. For each row, call qpdf --decrypt --password=ROW_PASSWORD. Log success/fail per file to an output CSV. Files that fail may indicate the password is wrong in the map — flag them for manual review.
This scenario is common in enterprise content migration projects where the document management system stores per-document passwords in a database. Export the password table as CSV, match by filename, and run the batch decrypt. A 10,000-file batch processes in 5-20 minutes on a modern server.
Scenario 3 — All passwords unknown (batch cracking)
When none of the passwords are known, you need batch password cracking. This is the most technically demanding scenario. The approach: extract hashes from all PDFs, classify by encryption mode, and run hashcat on the batch.
Step 1: Use pdf2john.pl (John the Ripper's PDF hash extractor) or pdf2hashcat to extract hash lines from all PDFs. Batch: for f in *.pdf; do pdf2john.pl "$f"; done > pdf_hashes.txt. The output is one hash line per file with the encryption mode prefix.
Step 2: Sort hashes by mode. Mode 10400 (40-bit RC4) should be separated from mode 10700 (AES-256) because different cracking strategies apply. Mode 10400 is guaranteed recoverable — just compute the keyspace. Mode 10700 requires dictionary+rule attacks.
Step 3: Run hashcat in dictionary mode against the batch: hashcat -m 10700 pdf_hashes.txt rockyou.txt -r best64.rule. Hashcat processes all hashes simultaneously — one cracked hash per file means that file's password was found. Files not cracked after the dictionary phase proceed to mask attack.
Hashcat batch optimizations
When cracking multiple PDF hashes simultaneously, hashcat processes them all against each candidate password. This is more efficient than cracking one file at a time because the GPU overhead is amortized across all hashes. A batch of 1,000 mode 10700 hashes against a 10-million-word dictionary processes in roughly the same time as a single hash.
Use hashcat's --outfile-autohex-disable flag for clean plaintext output, and --show after the run to list all cracked passwords. The hashcat potfile automatically tracks per-hash status — re-running the same batch skips already-cracked hashes.
For mode 10400 (40-bit RC4) batch cracking, use hashcat's brute-force on the 40-bit keyspace. Mode 10400 has no per-password cost variation — all hashes of this mode have the same work factor. A batch of 1,000 mode 10400 hashes against the 2^40 keyspace completes in hours on a single RTX 5090.
Automation infrastructure recommendations
Local GPU cluster: 4-8 high-end GPUs (RTX 5090 or equivalent). Expected batch throughput: 10,000-50,000 mode 10700 hashes tested against a 10M-word dictionary with 50 rules = 2-7 days for the dictionary phase, depending on GPU count and KDF round count.
Cloud GPU instances: AWS EC2 G6 instance (L40S GPU), Azure NC A100 v4, or Vast.ai rentals. Cloud GPUs are cost-effective for one-time batch cracking jobs — you stop the instance when done and pay only for compute time.
Job management: use hashcat's --potfile-disable for fully isolated per-run tracking, or --potfile-path to specify a custom potfile path per batch. For large batches, split the hash list into sub-batches (500-1,000 hashes each) and run on separate GPU instances to scale horizontally.
Handling mixed encryption tiers in a batch
Not all PDFs in a batch use the same encryption. A corporate archive spanning 2000-2026 likely contains: mode 10400 (PDF 1.1-1.3, pre-2001), mode 10500 (PDF 1.4-1.6 RC4), mode 10600 (PDF 1.6+ AES-128), and mode 10700 (PDF 1.7+ AES-256).
Extract all hashes with pdf2john.pl, which tags each line with the filename and mode. Then filter by mode prefix (e.g., grep '$pdf$4
#x27; for mode 10400, '$pdf$5#x27; for mode 10700) and crack each mode group separately with appropriate strategies.Mode 10400: brute-force key search (guaranteed). Mode 10500-10600: dictionary+rule attack (human-chosen passwords are likely). Mode 10700: dictionary+rule first, then mask attack with character-class constraints if no success.
Batch output verification and quality control
After batch cracking: run qpdf --decrypt with the found password to verify the output opens correctly. Use a script to check exit codes and file sizes. Unsuccessfully decrypted files (wrong password, no password found) should be quarantined for separate handling.
For files where hashcat found a password but qpdf rejects it: verify the hash extraction was correct (re-run pdf2john.pl and hashcat --show). Some PDFs have multiple encryption layers or corruption that prevents proper decryption even with the correct password.
Document the batch process thoroughly — especially which mode groups failed and why. For a corporate migration: the success rate is typically 60-90% for mode 10700 batch cracking (human-chosen passwords), 85-95% for mode 10600, 99.9+% for mode 10400 (guaranteed). Files with strong random passwords or password mismanagement account for the remainder.
Batch PDF unlock flow
- 1
Inventory the batch
Count files, check encryption tiers with qpdf --show-encryption, estimate total processing time.
- 2
Extract all hashes
pdf2john.pl *.pdf > pdf_hashes.txt. One hash line per file, tagged by mode.
- 3
Sort by encryption mode
Separate mode 10400 (guaranteed via key search) from 10500-10700 (password search).
- 4
Crack each mode group
Mode 10400: brute-force 40-bit key. Modes 10500-10700: dictionary+rule attack on multi-GPU cluster.
- 5
Verify and decrypt
Use hashcat --show to extract found passwords, then qpdf --decrypt each file with its password. Log failures for manual review.
Frequently Asked Questions
What is the fastest tool for batch PDF decryption?
Can I batch-crack PDFs on a CPU instead of GPU?
What about PDFs with different passwords per file?
How do I handle PDF 2.0 files in a batch?
What if some PDFs in the batch are corrupted?
Are there commercial batch PDF unlock tools?
Have a forgotten-password PDF to recover?
Run a free analysis — encryption type detected automatically, fast techniques tried first, pay only on success.
Run Free Analysis