Auto-cropping PDF files
You may know those PDF files (e.g. scientific publications) which are set in North American paper formats such as Letter or Legal, having a wide margin, and, when scaled to fit on A4 paper, the text has the size of your palm centered on the page.
A common way to use your paper more efficiently (and to save some trees) is to use pdfjam's pdfnup. With this tool, you can trim, clip, scale, rotate, and combine PDF documents into new PDF files. I use this tool quite often to create 3-on-1, 6-on-1, or 8-on-1 pages of presentation slides. Cropping works perfectly fine with this tool, but you have to specify the margins manually. Finding the right values (e.g. 3cm on the left, 1cm on the bottom etc.) requires a good eye or a trial-and-error approach.
Here comes my new script at rescue. Using some magic (command line tools such as GhostScript and ImageMagick), it scans your PDF document for the optimal clipping margins and creates a new PDF document where the input document is scaled to fit efficiently on an A4 page with 1cm margin. This page can be printed or processed further with pdfnup.
To use this script, save it somewhere in your path (/usr/local/bin, requires sudo or su permissions), make it executable, and run it as follows:
autocroppdf.sh input.pdfIf everything works, a new file called input-cropped.pdf is created.
And here comes the script (highlighting done with Kate's HTML export):
#!/bin/sh ## This script relies on the following external tools: ## - pdfinfo (part of poppler) ## - GhostScript ## - ImageMagick (convert and identify) ## - awk ## - bc ## - pdfLaTeX ## for each command line parameter for inputfile in "$@" ; do ## check if file exists and is not empty if [[ ! -s "${inputfile}" ]] ; then echo "Parameter \"${inputfile}\" is not an existing file." >&2 continue fi ## check if file as actually a .pdf ending if [[ "${inputfile/.pdf/-cropped.pdf}" = "${inputfile}" ]] ; then echo "File \"${inputfile}\" doesn't seem to be valid input file." >&2 continue fi ## determine number of pages using the external tool "pdfinfo" numpages=$(pdfinfo "${inputfile}" | awk '/^Pages: / {print $2}') ## if number of pages is below 1 (i.e. zero), the input file cannot be a valid PDF file if [[ ${numpages} < 1 ]] ; then echo "Could not determine number of pages for file \"${inputfile}\"." >&2 continue fi ## create temporary directory with random name TMPDIR="/tmp/autocroppdf_$$_$RANDOM" mkdir -p "${TMPDIR}" ## use GhostScript to generate thumbnail images of each page in the PDF file echo "Generating image previews" gs -dNOPAUSE -sDEVICE=pngmono -q -r64 -dBATCH -sOutputFile="${TMPDIR}/testA%06d.png" "$inputfile" >"${TMPDIR}/gs-stdout.txt" 2>"${TMPDIR}/gs-stderr.txt" ## check GhostScript's exit code to see if something went wrong if [[ $? -ne 0 ]] ; then echo "Ghostscript failed to interpret \"${inputfile}\"." >&2 rm -rf "${TMPDIR}" continue fi ## check GhostScript's output (both stdout and stderr) for warning or error messages if [[ $(cat "${TMPDIR}/gs-stdout.txt" "${TMPDIR}/gs-stderr.txt" | wc -l) -ge 1 ]] ; then echo " There were warnings or errors from Ghostscript, continuing anyways." fi ## initialize some variables for the following steps biginit=100000 minx=${biginit} miny=${biginit} maxx=0 maxy=0 outerwidth=0 outerheight=0 originalwidth=0 originalheight=0 echo "Determining crop margin..." ## go through each thumbnail created by GhostScript ... for pngfile in ${TMPDIR}/testA*.png ; do ## trim (i.e. remove uni-color margins) from picture and save result convert "${pngfile}" -trim "${pngfile/testA/testB}" >"${TMPDIR}/convert-stdout.txt" 2>"${TMPDIR}/convert-stderr.txt" ## catch potential problems, e.g. if trim failed on an empty page if [[ $? -ne 0 || $(cat "${TMPDIR}/convert-stdout.txt" "${TMPDIR}/convert-stderr.txt" | wc -l) -ge 1 ]] ; then continue fi ## the trim/clip/crop operation's information (margins) are still available in the cropped file -> retrieve them eval $(identify "${pngfile/testA/testB}" | awk -F '[- x+]+' '{print "innerwidth="$3" innerheight="$4" outerwidth="$5" outerheight="$6" xoffset="$7" yoffset="$8}') ## do some math to determine boundaries right=$(($xoffset + $innerwidth)) bottom=$(($yoffset + $innerheight)) if [[ $minx -gt $xoffset ]] ; then minx=$xoffset ; fi if [[ $miny -gt $yoffset ]] ; then miny=$yoffset ; fi if [[ $right -gt $maxx ]] ; then maxx=$right ; fi if [[ $bottom -gt $maxy ]] ; then maxy=$bottom ; fi done ## check determined boundaries for soundness if [[ ${outerheight} -le 0 || ${outerwidth} -le 0 || ${maxx} -le 0 || ${maxy} -le 0 || ${minx} -ge ${biginit} || ${miny} -ge ${biginit} ]] ; then echo "Could not identify crop margins for file \"${inputfile}\"." >&2 rm -rf "${TMPDIR}" continue fi ## use pdfinfo again to determine original file's page size (in mm) eval $(pdfinfo "$inputfile" | awk '/^Page size: / {print "originalwidth="($3 / 2.83)" originalheight="($5 / 2.83)}') if [[ "${originalwidth}" = "0" || "${originalheight}" = "0" ]] ; then echo "Could not identify page size for file \"${inputfile}\"." >&2 rm -rf "${TMPDIR}" continue fi ## some more math to determine trim values for all four sides trimleft=$(echo "scale=4 ; $minx / $outerwidth * $originalwidth" | bc) trimbottom=$(echo "scale=4 ; ( $outerheight - $maxy ) / $outerheight * $originalheight" | bc) trimright=$(echo "scale=4 ; ( $outerwidth - $maxx ) / $outerwidth * $originalwidth" | bc) trimtop=$(echo "scale=4 ; $miny / $outerheight * $originalheight" | bc) ## write header of LaTeX source file cat <<EOF >"${TMPDIR}/output.tex" \documentclass{article} \usepackage[pdftex]{graphicx} \usepackage[margin=1cm,a4paper]{geometry} \pagestyle{empty} \setlength{\parindent}{0pt} \setlength{\parskip}{0pt} \begin{document} \centering EOF ## insert each input page individually into output file and apply crop/trim/clip operation for n in $(seq 1 ${numpages}) ; do echo '\includegraphics[width=185mm,height=272mm,keepaspectratio=true,page='${n}',clip,trim='${trimleft}'mm '${trimbottom}'mm '${trimright}'mm '${trimtop}'mm]{'"${inputfile}"'}\par\clearpage' done >>"${TMPDIR}/output.tex" ## write footer of LaTeX source file cat <<EOF >>"${TMPDIR}/output.tex" \end{document} EOF ## use pdfLaTeX to compile LaTeX source file into a PDF file pdflatex -halt-on-error -output-directory="${TMPDIR}" "${TMPDIR}/output.tex" >"${TMPDIR}/pdflatex-stdout.txt" 2>"${TMPDIR}/pdflatex-stderr.txt" if [[ $? -ne 0 ]] ; then echo "pdfLaTeX failed to compile cropped PDF document." >&2 rm -rf "${TMPDIR}" continue fi ## copy resulting PDF file to original input file with modified filename cp -p "${TMPDIR}/output.pdf" "${inputfile/.pdf/-cropped.pdf}" && echo "Generated cropped file ${inputfile/.pdf/-cropped.pdf}." ## clean-up mess rm -rf "${TMPDIR}" done
no subject