Reading and writing bibliography data

Reading bibliography data

One of the most common things to do with Pybtex API is parsing BibTeX files. There are several high level functions in the pybtex.database module for reading bibliography databases.

pybtex.database.parse_string(value, bib_format, **kwargs)

Parse a Unicode string containing bibliography data and return a BibliographyData object.

Parameters
  • value – Unicode string.

  • bib_format – Data format (“bibtex”, “yaml”, etc.).

New in version 0.19.

pybtex.database.parse_bytes(value, bib_format, **kwargs)

Parse a byte string containing bibliography data and return a BibliographyData object.

Parameters
  • value – Byte string.

  • bib_format – Data format (for example, “bibtexml”).

New in version 0.19.

pybtex.database.parse_file(file, bib_format=None, **kwargs)

Read bibliography data from file and return a BibliographyData object.

Parameters
  • file – A file name or a file-like object.

  • bib_format – Data format (“bibtex”, “yaml”, etc.). If not specified, Pybtex will try to guess by the file name.

New in version 0.19.

Each of these functions does basically the same thing. It reads the bibliography data from a string or a file and returns a BibliographyData object containing all the bibliography data.

Here is a quick example:

>>> from pybtex.database import parse_file
>>> bib_data = parse_file('../examples/tugboat/tugboat.bib')
>>> print(bib_data.entries['Knuth:TB8-1-14'].fields['title'])
Mixing right-to-left texts with left-to-right texts
>>> for author in bib_data.entries['Knuth:TB8-1-14'].persons['author']:
...     print(unicode(author))
Knuth, Donald
MacKay, Pierre

Writing bibliography data

The BibliographyData class has several methods that are symmetrical to the functions described above:

>>> from pybtex.database import BibliographyData, Entry
>>> bib_data = BibliographyData({
...     'article-minimal': Entry('article', [
...         ('author', 'L[eslie] B. Lamport'),
...         ('title', 'The Gnats and Gnus Document Preparation System'),
...         ('journal', "G-Animal's Journal"),
...         ('year', '1986'),
...     ]),
... })
>>> print(bib_data.to_string('bibtex'))
@article{article-minimal,
    author = "L[eslie] B. Lamport",
    title = "The Gnats and Gnus Document Preparation System",
    journal = "G-Animal's Journal",
    year = "1986"
}

Bibliography data classes

Pybtex uses several classes to represent bibligraphy databases:

  • BibliographyData is a collection of individual bibliography entries and some additional metadata.

  • Entry is a single bibliography entry (a book, an article, etc.).

    An entry has a key (like "knuth74"), a type ("book", "article", etc.), and a number of key-value fields ("author", "title", etc.).

  • Person is a person related to a bibliography entry (usually as an author or an editor).

class pybtex.database.BibliographyData(entries=None, preamble=None, wanted_entries=None, min_crossrefs=2)
entries

A dictionary of bibliography entries referenced by their keys.

The dictionary is case insensitive:

>>> bib_data = parse_string("""
...     @ARTICLE{gnats,
...         author = {L[eslie] A. Aamport},
...         title = {The Gnats and Gnus Document Preparation System},
...     }
... """, 'bibtex')
>>> bib_data.entries['gnats'] == bib_data.entries['GNATS']
True
property preamble_list

LaTeX preamble as list of strings

>>> bib_data = parse_string(r"""
...     @PREAMBLE{"\newcommand{\noopsort}[1]{}"}
...     @PREAMBLE{"\newcommand{\nooptilde}[1]{}"}
... """, 'bibtex')
>>> print(bib_data.preamble_list)
['\\newcommand{\\noopsort}[1]{}', '\\newcommand{\\nooptilde}[1]{}']

New in version 0.19: Earlier versions used get_preamble(), which is now deprecated.

property preamble

LaTeX preamble.

>>> bib_data = parse_string(r"""
...     @PREAMBLE{"\newcommand{\noopsort}[1]{}"}
... """, 'bibtex')
>>> print(bib_data.preamble)
\newcommand{\noopsort}[1]{}

New in version 0.19: Earlier versions used get_preamble(), which is now deprecated.

get_preamble()

Deprecated since version 0.19: Use preamble instead.

to_string(bib_format, **kwargs)

Return the data as a unicode string in the given format.

Parameters

bib_format – Data format (“bibtex”, “yaml”, etc.).

New in version 0.19.

classmethod from_string(value, bib_format, **kwargs)

Return the data from a unicode string in the given format.

Parameters

bib_format – Data format (“bibtex”, “yaml”, etc.).

New in version 0.22.2.

to_bytes(bib_format, **kwargs)

Return the data as a byte string in the given format.

Parameters

bib_format – Data format (“bibtex”, “yaml”, etc.).

New in version 0.19.

to_file(file, bib_format=None, **kwargs)

Save the data to a file.

Parameters
  • file – A file name or a file-like object.

  • bib_format – Data format (“bibtex”, “yaml”, etc.). If not specified, Pybtex will try to guess by the file name.

New in version 0.19.

lower()

Return another BibliographyData with all identifiers converted to lowercase.

>>> data = parse_string("""
...     @BOOK{Obrazy,
...         title = "Obrazy z Rus",
...         author = "Karel Havlíček Borovský",
...     }
...     @BOOK{Elegie,
...         title = "Tirolské elegie",
...         author = "Karel Havlíček Borovský",
...     }
... """, 'bibtex')
>>> data_lower = data.lower()
>>> list(data_lower.entries.keys())
['obrazy', 'elegie']
>>> for entry in data_lower.entries.values():
...     entry.key
...     list(entry.persons.keys())
...     list(entry.fields.keys())
'obrazy'
['author']
['title']
'elegie'
['author']
['title']
class pybtex.database.Entry(type_, fields=None, persons=None)

A bibliography entry.

key = None

Entry key (for example, 'fukushima1980neocognitron').

type = None

Entry type ('book', 'article', etc.).

fields = None

A dictionary of entry fields. The dictionary is ordered and case-insensitive.

persons = None

A dictionary of entry persons, by their roles.

The most often used roles are 'author' and 'editor'.

to_string(bib_format, **kwargs)

Return the data as a unicode string in the given format.

Parameters

bib_format – Data format (“bibtex”, “yaml”, etc.).

classmethod from_string(value, bib_format, entry_number=0, **kwargs)

Return the data from a unicode string in the given format.

Parameters
  • bib_format – Data format (“bibtex”, “yaml”, etc.).

  • entry_number – entry number if the string has more than one.

New in version 0.22.2.

class pybtex.database.Person(string='', first='', middle='', prelast='', last='', lineage='')

A person or some other person-like entity.

>>> knuth = Person('Donald E. Knuth')
>>> knuth.first_names
['Donald']
>>> knuth.middle_names
['E.']
>>> knuth.last_names
['Knuth']
first_names = None

A list of first names.

New in version 0.19: Earlier versions used first(), which is now deprecated.

middle_names = None

A list of middle names.

New in version 0.19: Earlier versions used middle(), which is now deprecated.

prelast_names = None

A list of pre-last (aka von) name parts.

New in version 0.19: Earlier versions used middle(), which is now deprecated.

last_names = None

A list of last names.

New in version 0.19: Earlier versions used last(), which is now deprecated.

lineage_names = None

A list of linage (aka Jr) name parts.

New in version 0.19: Earlier versions used lineage(), which is now deprecated.

property bibtex_first_names

A list of first and middle names together. (BibTeX treats all middle names as first.)

New in version 0.19: Earlier versions used Person.bibtex_first(), which is now deprecated.

>>> knuth = Person('Donald E. Knuth')
>>> knuth.bibtex_first_names
['Donald', 'E.']
get_part(type, abbr=False)

Get a list of name parts by type.

>>> knuth = Person('Donald E. Knuth')
>>> knuth.get_part('first')
['Donald']
>>> knuth.get_part('last')
['Knuth']
property rich_first_names

A list of first names converted to rich text.

New in version 0.20.

property rich_middle_names

A list of middle names converted to rich text.

New in version 0.20.

property rich_prelast_names

A list of pre-last (aka von) name parts converted to rich text.

New in version 0.20.

property rich_last_names

A list of last names converted to rich text.

New in version 0.20.

property rich_lineage_names

A list of lineage (aka Jr) name parts converted to rich text.

New in version 0.20.

first(abbr=False)

Deprecated since version 0.19: Use first_names instead.

middle(abbr=False)

Deprecated since version 0.19: Use middle_names instead.

prelast(abbr=False)

Deprecated since version 0.19: Use prelast_names instead.

last(abbr=False)

Deprecated since version 0.19: Use last_names instead.

lineage(abbr=False)

Deprecated since version 0.19: Use lineage_names instead.

bibtex_first()

Deprecated since version 0.19: Use bibtex_first_names instead.