Saturday, 23 January 2010

Counting pages

spanish version - all english posts

We have an application that generates PDF reports from rst formated text files. One of the requirements that the applicaction should have is the numbering of pages in the format "Page X of Y". Unfortunaltely our application is an implemetation of rst2pdf 0.11 which doesn't have shuch feature (at least in the version we have). Although it wouldn't be too complex to implement it by modifying the code of python-docutils, this goes against my principles: Packages should be maintained by their developers at first instance, and by the distro packagers at second stage (in this case Ubuntu LTS 8.04). As an administrator I do not like altering this policy because then we must be aware of future security updates that may break things when our changes will be overriden.

I'm a bash guy, hence I always tend to prototype using the shell. So here is a quick way to implement a method to count the total pages of a PDF document w/o installing any specific tool.

Total PDF pages obtained by grep:

hmontoliu@blogspot:/tmp$ grep -c --binary-files=text '/Page\b' foo.pdf

Of course the package xpdf-utils includes the command "pdfinfo" which reports the number of pages in a document; but again, if you can avoid installing extra packages on a server much better.

The avobe command also applies to ps documents with minor modifications:

hmontoliu@blogspot:/tmp$ grep -c '%%Page' bar.ps

More information at "man grep" :-)

Implementing this simple grep in python as a call to the shell or doing a grep-like stuff through re or string.count('text') is trivial; for example:

In [1]: with open('/tmp/foo.pdf','rb') as f:
   ...:     f.read().count('/Page>')

Out[1]: 818


Post a Comment