python command line tools for BibTeX

This is a set of command-line tools (written in Python) for manipulating BibTeX format files.

All programs have command lines of the form:

The file arguments are expected to be in BibTeX format, and the parser is proper and does not make assumptions about the file layout.

Each command can have zero or more file arguments.  If zero then input is expected from stdin, so these commands can be used in complex shell pipelines.  One or more files are read in, processed and the results output.  Output is always to stdout.  Informational messages all go to stderr.

When invoked with no arguments at all each program will display detailed usage instructions.

General file operations

The --verbose option provides info on the number of records being read and written.  The --resolve option removes crossref fields, copying fields not present in the current entry from the crossreferenced entry.  The --ignore option stops complaints about entries with duplicate cite keys, they are just silently ignored.

A multi-step matching process is employed based on title, authors, year, page numbers and volume.  Title matching is fuzzy and --dthres sets the Levenstein distance that will be tolerated.

--showdup shows the duplicate entries and the files they came from.

Listing and displaying

The --brief option gives a single line version of the reference.

The output is in a form that can easily be sorted to see who your favourite co-author is.

Searching and sorting

The keys are specified by one or more --key options or extracted from a LaTeX aux file specified with the --aux option.

The output is normally in BibTeX format, but the --brief option gives one line summaries.

Extract matching records.  Multiple filter criteria can be applied at once. 

--since includes all papers since the specified date, inclusive, while --before is all papers before the date, exclusive.  A date is given as YYYY or MM/YYYY, eg. 2006 or 7/2006.  Papers with no month are assumed to be written 1 Jan of the year.

--hasfield is true if the record has a finite value for the specified field.

--field is true if the specified field contains the specified string anywhere within it.  field can be set to all.  Searches are normally case insensitive unless the --case option is given.

The output is normally in BibTeX format, but the --brief option gives one line summaries, and --count just lists how many entries were found.

Normally sorts into descending order (newest first), but --reverse puts old first.

Find references on the web

Each reference is searched for using Google scholar based on a search string constructed from the title and the authors.  If a web link is found, it is added to the bibliographic entry as a URL field.  Fields with a URL field already are not searched for.

Converting to other display formats

This tool makes a nice printable version of your bibliography, and optionally invokes the xdvi viewer.  The resulting dvi file has the name of the first file argument, but with extension .dvi, or stdin.dvi if input was from stdin.

This is useful for putting your bibliography online.  The --highlight option causes every instance of the specified word to be highlighted in red, useful if the HTML is dynamic and generated in response to some query.

Currently the format/layout is rather simple and dull.  There are other packages out there that perform the same function, just not integrated with this class library.


Download the compressed tar file

It requires Python 2.5 or better.

About the package

This was based on my experience with TkBibTex which I wrote a long time ago.  It did the job but the biggest problems were: the Tcl language is so hard to read and maintain and has no decent data structures (the vanilla version anyway).  When I started out with Python I decided to reimplement it, and this is the result. is  general class for a bibliographic item, it has no BibTeX specific code is a general container class for bibliographic items and thus represents a bibliography defines subclasses for each of BibEntry and Bibliography that are BibTeX specific.

These classes are the basis for all of the command line utilities above.

In principle other subclasses could be written, for EndNote or any other common format.


  1. A graphical interface like TkBibTex, probably using WxPython

  2. More consistency in switches

  3. Better handling of strings, they probably should be passed along automatically between the programs

  4. Improved HTML generator

  5. An instantiation for a different file format, mayb EndNote.