Working with PDFs From the Command Line

July 22, 2018, updated November 11, 2022

Install Some Tools

sudo snap install pdftk

Create a PDF from a Single or Range of Pages from Another PDF

pdftk input.pdf cat 2-2 output page2.pdf
pdftk input.pdf cat 1-2 output pages1-2.pdf
pdftk A=in1.pdf B=in2.pdf cat A B output out1.pdf
pdftk A=in1.pdf cat A1-12 A14-end output out1.pdf

Split a PDF up into Individual Page PDF Files

pdftk input.pdf burst

Notice this will generate a doc_data.txt file and individual PDF files for each page of the document.

Join Multiple PDF Files Into One

pdftk 1.pdf 2.pdf 3.pdf 4.pdf cat output merged.pdf

or

convert 1.pdf 2.pdf 3.pdf 4.pdf merged.pdf

Note that in my experience convert will result in a low quality output using the default options. pdftk seems to give a better result.

Extract Text from a PDF

pdftotext input.pdf output.txt

Extract Images from PDF

pdfimages input.pdf prefix

Notice that all of the output ppm images are inverted in color.

Create a New PDF from Extracted Images

convert -negate *.ppm output.pdf

Related Posts