Formatting bibliographies

The main purpose of Pybtex is turning machine-readable bibliography data into human-readable bibliographies formatted in a specific style. Pybtex reads bibliography data that looks like this:

@book{graham1989concrete,
    title = "Concrete mathematics: a foundation for computer science",
    author = "Graham, Ronald Lewis and Knuth, Donald Ervin and Patashnik, Oren",
    year = "1989",
    publisher = "Addison-Wesley"
}

and formats it like this:

R. L. Graham, D. E. Knuth, and O. Patashnik. Concrete mathematics: a foundation for computer science. Addison-Wesley, 1989.

Pybtex contains two different formatting engines:

BibTeX engine

The BibTeX engine is fully compatible with BibTeX style files and is used by default.

How it works

When you type pybtex mydocument, the following things happen:

  1. Pybtex reads the file mydocument.aux in the current directory. This file is normally created by LaTeX and contains all sorts of auxiliary information collected during processing of the LaTeX document.

    Pybtex is interested in these three pieces of information:

    Bibliography style:

    First, Pybtex searches the .aux file for a \bibstyle command that specifies which formatting style will be used.

    For example, \bibstyle{unsrt} instructs Pybtex to use formatting style defined in the file unsrt.bst.

    Bibliography data:

    Next, Pybtex expects to find at least one \bibdata command in the .aux file that tells where to look for the bibliography data.

    For example, \bibdata{mydocument} means “use the bibliography data from mydocument.bib”.

    Citations:

    Finally, Pybtex needs to know which entries to put into the resulting bibliography. Pybtex gets the list of citation keys from \citation commands in the .aux file.

    For example, \citation{graham1989concrete} means “include the entry with the key graham1989concrete into the resulting bibliograhy”.

    A wildcard citation \citation{*} tells Pybtex to format the bibliography for all entries from all data files specified by all \bibdata commands.

  2. Pybtex executes the style program in the .bst file specified by the \bibstyle command in the .aux file. As a result, a .bbl file containing the resulting formatted bibliography is created.

    A .bst style file is a program in a domain-specific stack-based language. A typical piece of the .bst code looks like this:

    FUNCTION {format.bvolume}
    { volume empty$
        { "" }
        { "volume" volume tie.or.space.connect
        series empty$
            'skip$
            { " of " * series emphasize * }
        if$
        "volume and number" number either.or.check
        }
    if$
    }
    

    The code in a .bst file contains the complete step-by-step instructions on how to create the formatted bibliography from the given bibliography data and citation keys. For example, a READ command tells Pybtex to read the bibliography data from all files specified by \bibdata commands in the .aux file, an ITERATE command tells Pybtex to execute a piece of code for each citation key specified by \citation commands, and so on. The built-in write$ function tells Pybtex to write the given string into the resulting .bbl file. Pybtex implements all these commands and built-in functions and simply executes the .bst program step by step.

    A complete reference of the .bst language can be found in the BibTeX hacking guide by Oren Patashnik. It is available by running texdoc btxhak in most TeX distributions.

Python engine

The Python engine is enabled by running pybtex with the -l python option.

Differences from the BibTeX engine

  • Formatting styles are written in Python instead of the .bst language.

  • Formatting styles are not tied to LaTeX and do not use hardcoded LaTeX markup. Instead of that they produce format-agnostic pybtex.richtext.Text objects that can be converted to any markup format (LaTeX, Markdown, HTML, etc.).

  • Name formatting, label formatting, and sorting styles are defined separately from the main style.

How it works

When you type pybtex -l python mydocument, this things happen:

  1. Pybtex reads the file mydocument.aux in the current directory and extracts the name of the the bibliography style, the list of bibliography data files and the list of citation keys. This step is exactly the same as with the BibTeX engine.

  2. Pybtex reads the biliography data from all data files specified in the .aux file into a single BibliographyData object.

  3. Then the formatting style is loaded. The formatting style is a Python class with a format_bibliography() method. Pybtex passes the bibliography data (a BibliographyData object) and the list of citation keys to format_bibliography().

  4. The formatting style formats each of the requested bibliography entries in a style-specific way.

    When it comes to formatting names, a name formatting style is loaded and used. A name formatting style is also a Python class with a specific interface. Similarly, a label formatting style is used to format entry labels, and a sorting style is used to sort the resulting style. Each formatting style has a default name style, a default label style and a default sorting style. The defaults can be overridden with options passed to the main style class.

    Each formatted entry is put into a FormattedEntry object which is just a container for the formatted label, the formatted entry text (a pybtex.richtext.Text object) and the entry key. The reason that the label, the key and the main text are stored separately is to give the output backend more flexibility when converting the FormattedEntry object to the actual markup. For example, the HTML backend may want to format the bibliography as a definition list, the LaTeX backend would use \bibitem[label]{key} text constructs, etc.

    Formatted entries are put into a FormattedBibliography object—it simply contains a list of FormattedEntry objects and some additional metadata.

  5. The resulting FormattedBibliography is passed to the output backend. The default backend is LaTeX. It can be changed with the pybtex --output-backend option. The output backend converts the formatted bibliography to the specific markup format and writes it to the output file.

Python API

The base interface

Both the Python engine and the BibTeX engine use the same interface defined in pybtex.Engine.

pybtex.Engine has a handful of methods but most of them are just convenience wrappers for Engine.format_from_files() that does the actual job.

class pybtex.Engine
make_bibliography(aux_filename, style=None, output_encoding=None, bib_format=None, **kwargs)

Read the given .aux file and produce a formatted bibliography using format_from_files().

Parameters

style – If not None, use this style instead of specified in the .aux file.

format_from_string(bib_string, *args, **kwargs)

Parse the bigliography data from the given string and produce a formated bibliography using format_from_files().

This is a convenience method that calls format_from_strings() with a single string.

format_from_strings(bib_strings, *args, **kwargs)

Parse the bigliography data from the given strings and produce a formated bibliography.

This is a convenience method that wraps each string into a StringIO, then calls format_from_files().

format_from_file(filename, *args, **kwargs)

Read the bigliography data from the given file and produce a formated bibliography.

This is a convenience method that calls format_from_files() with a single file. All extra arguments are passed to format_from_files().

format_from_files(**kwargs)

Read the bigliography data from the given files and produce a formated bibliography.

This is an abstract method overridden by both pybtex.PybtexEngine and pybtex.bibtex.BibTeXEngine.

The BibTeXEngine class

The BibTeX engine lives in the pybtex.bibtex module. The public interface consists of the BibTeXEngine class and a couple of convenience functions.

class pybtex.bibtex.BibTeXEngine

The Python fomatting engine.

See pybtex.Engine for inherited methods.

format_from_files(bib_files_or_filenames, style, citations=['*'], bib_format=None, bib_encoding=None, output_encoding=None, bst_encoding=None, min_crossrefs=2, output_filename=None, add_output_suffix=False, **kwargs)

Read the bigliography data from the given files and produce a formated bibliography.

Parameters
  • bib_files_or_filenames – A list of file names or file objects.

  • style – The name of the formatting style.

  • citations – A list of citation keys.

  • bib_format – The name of the bibliography format. The default format is bibtex.

  • bib_encoding – Encoding of bibliography files.

  • output_encoding – Encoding that will be used by the output backend.

  • bst_encoding – Encoding of the .bst file.

  • min_crossrefs – Include cross-referenced entries after this many crossrefs. See BibTeX manual for details.

  • output_filename – If None, the result will be returned as a string. Else, the result will be written to the specified file.

  • add_output_suffix – Append a .bbl suffix to the output file name.

pybtex.bibtex.make_bibliography(*args, **kwargs)

A convenience function that calls BibTeXEngine.make_bibliography().

pybtex.bibtex.format_from_string(*args, **kwargs)

A convenience function that calls BibTeXEngine.format_from_string().

pybtex.bibtex.format_from_strings(*args, **kwargs)

A convenience function that calls BibTeXEngine.format_from_strings().

pybtex.bibtex.format_from_file(*args, **kwargs)

A convenience function that calls BibTeXEngine.format_from_file().

pybtex.bibtex.format_from_files(*args, **kwargs)

A convenience function that calls BibTeXEngine.format_from_files().

The PybtexEngine class

The Python engine resides in the pybtex module and uses an interface similar to the BibTeX engine. There is the PybtexEngine class and some convenience functions.

class pybtex.PybtexEngine

The Python fomatting engine.

See pybtex.Engine for inherited methods.

format_from_files(bib_files_or_filenames, style, citations=['*'], bib_format=None, bib_encoding=None, output_backend=None, output_encoding=None, min_crossrefs=2, output_filename=None, add_output_suffix=False, **kwargs)

Read the bigliography data from the given files and produce a formated bibliography.

Parameters
  • bib_files_or_filenames – A list of file names or file objects.

  • style – The name of the formatting style.

  • citations – A list of citation keys.

  • bib_format – The name of the bibliography format. The default format is bibtex.

  • bib_encoding – Encoding of bibliography files.

  • output_backend – Which output backend to use. The default is latex.

  • output_encoding – Encoding that will be used by the output backend.

  • bst_encoding – Encoding of the .bst file.

  • min_crossrefs – Include cross-referenced entries after this many crossrefs. See BibTeX manual for details.

  • output_filename – If None, the result will be returned as a string. Else, the result will be written to the specified file.

  • add_output_suffix – Append default suffix to the output file name (.bbl for LaTeX, .html for HTML, etc.).

pybtex.make_bibliography(*args, **kwargs)

A convenience function that calls PybtexEngine.make_bibliography().

pybtex.format_from_string(*args, **kwargs)

A convenience function that calls PybtexEngine.format_from_string().

pybtex.format_from_strings(*args, **kwargs)

A convenience function that calls PybtexEngine.format_from_strings().

pybtex.format_from_file(*args, **kwargs)

A convenience function that calls PybtexEngine.format_from_file().

pybtex.format_from_files(*args, **kwargs)

A convenience function that calls PybtexEngine.format_from_files().