Does anyone have any recommendation or procedures for repairing a corrupt PDF? When I open the file I get "There was an error opening this document. the file is damaged and cannot be repaired." There seems to be a myriad of tools out there but none that I could describe as reputable. Are there any opensource linux based solutions for this possibly?
34.6k 10 10 gold badges 106 106 silver badges 153 153 bronze badges asked May 3, 2011 at 14:35 user15968 user15968 Opensource PDF tools tend to be pretty crappy, I'm afraid. What are you using? Commented May 3, 2011 at 14:38 Commented May 3, 2011 at 14:39didnt like the look of any of the tools as they looked like the myriad of "Registry Cleaners" out there that are useless. Have been trying Adobe Pro and have just started looking if Ghostscript or PDFForge have any repair switches.
Commented May 3, 2011 at 14:48 Ghostscript is okay, but it's certainly not better than Acrobat. It's completely bare bones. Commented May 3, 2011 at 18:41@Satanicpuppy I disagree :: I use ghostscript to rebuild damaged or low-quality pdfs quite often and it performs very well.
Commented Feb 5, 2013 at 20:16Ghostscript will repair your corrupted PDF automatically. if it can open it in the first place (that is, if it is not damaged beyond repair). But afterwards you'll still need to double-check the result.
On Linux, try this command:
gs \ -o repaired.pdf \ -sDEVICE=pdfwrite \ -dPDFSETTINGS=/prepress \ corrupted.pdf
On Windows, try this one:
gswin32c.exe ^ -o repaired.pdf ^ -sDEVICE=pdfwrite ^ -dPDFSETTINGS=/prepress ^ corrupted.pdf
answered May 11, 2011 at 12:47
Kurt Pfeifle Kurt Pfeifle
12.8k 3 3 gold badges 56 56 silver badges 72 72 bronze badges
Ghostscript does a fantastic job of rendering pdfs . I regularly use gs to rebuild pdfs to improve font quality.
Commented Feb 5, 2013 at 20:14 The /prepress make the quality really good compared to /screen. Thanks. Commented Sep 13, 2015 at 22:17 I get "An error occurred while reading an XREF table." What does that mean? Commented Jun 18, 2019 at 15:26It means the internal table of contents (what PDFs have to contain as XREF table) had an error, pointing to a wrong byte offset for a PDF object. Ghostscript very likely repaired that error and inserted a correct XREF table into the output. You can check this by running the output through Ghostscript one more time and see if this message still appears.
Commented Jun 18, 2019 at 18:13Note: According to ghostcript v9.54 documentation it is better to use -dPDFSETTINGS=/default instead of -dPDFSETTINGS=/prepress .
Commented Mar 18, 2022 at 20:48I had a corrupted PDF file, print.pdf , that Ghostscript couldn't open, but the usual graphical Linux PDF viewers (Okular, Evince) opened fine. (In my case, the file had garbage at the start instead of a PDF header, when opened in a hex editor.)
These PDF viewers use Poppler as a back-end PDF renderer. So you can repair the PDF using Poppler's command-line tools. In Ubuntu these are in the poppler-utils package. I used:
pdftocairo -pdf print.pdf print_repaired.pdf
which generated a PDF file with correct headers, which tools like Ghostscript now accepted.
answered Jun 18, 2013 at 2:01 Mechanical snail Mechanical snail 7,803 5 5 gold badges 46 46 silver badges 67 67 bronze badges+1 this read my Quartz generated PDF without complaints, and immediately started generating output. Ghostscript, Adobe Acrobat Pro and others insisted on rebuilding my 120GB pdf first.
Commented Dec 14, 2013 at 14:17 This didn't work for at least one weird PDF I came across, but it seems like a good start. Commented Nov 11, 2014 at 20:00 Works perfectly on a PDF on which Ghostscript wanted to remove some arbitrary elements on pages. Commented Nov 22, 2014 at 16:14Ghostscript failed to read the document but this worked like a charm. BTW I did this on Windows using the new linux subsystem, so cool!
Commented Jun 5, 2016 at 17:44mutool (project page, manpage) will repair broken PDFs without printing them.
mutool clean [options] input.pdf [output.pdf] [pages] The clean command pretty prints and rewrites the syntax of a PDF file. It can be used to repair broken files, expand compressed streams, filter out a range of pages, etc. If no output file is specified, it will write the cleaned PDF to "out.pdf" in the current directory.