Designing styles

Rich text

Pybtex has a set of classes for working with formatted text and producing formatted output. A piece of formatted text in Pybtex is represented by a Text object. A Text is basically a container that holds a list of

  • plain text parts, represented by String objects,
  • formatted parts, represented by Tag and HRef objects.

The basic workflow is:

  1. Construct a Text object.
  2. Render it as LaTeX, HTML or other markup.
>>> from pybtex.richtext import Text, Tag
>>> text = Text('How to be ', Tag('em', 'a cat'), '.')
>>> print(text.render_as('html'))
How to be <em>a cat</em>.
>>> print(text.render_as('latex'))
How to be \emph{a cat}.

Rich text classes

There are several rich text classes in Pybtex:

Text is the top level container that may contain String, Tag, and HRef objects. When a Text object is rendered into markup, it renders all of its child objects, then concatenates the result.

String is just a wrapper for a single Python string.

Tag and HRef are also containers that may contain other String, Tag, and HRef objects. This makes nested formatting possible. For example, this stupidly formatted text:

is represented by this object tree:

>>> text = Text(
...     HRef('http://ctan.org/', Tag('em', 'Comprehensive'), ' TeX Archive Network'),
...     ' is ',
...     Tag('em', 'comprehensive'),
...     '.',
... )
>>> print(text.render_as('html'))
<a href="http://ctan.org/"><em>Comprehensive</em> TeX Archive Network</a> is <em>comprehensive</em>.

Protected represents a “protected” piece of text, something like {braced text} in BibTeX. It is not affected by case-changing operations, like Text.upper() or Text.lower(), and is not splittable by Text.split().

All rich text classes share the same API which is more or less similar to plain Python strings.

Like Python strings, rich text objects are supposed to be immutable. Methods like Text.append() or Text.upper() return a new Text object instead of modifying the data in place. Attempting to modify the contents of an existing Text object is not supported and may lead to weird results.

Here we document the methods of the Text class. The other classes have the same methods.

class pybtex.richtext.Text(*parts)

The Text class is the top level container that may contain String, Tag or HRef objects.

__init__(*parts)

Create a text object consisting of one or more parts.

Empty parts are ignored:

>>> Text() == Text('') == Text('', '', '')
True
>>> Text('Word', '') == Text('Word')
True

Text() objects are unpacked and their children are included directly:

>>> Text(Text('Multi', ' '), Tag('em', 'part'), Text(' ', Text('text!')))
Text('Multi ', Tag('em', 'part'), ' text!')
>>> Tag('strong', Text('Multi', ' '), Tag('em', 'part'), Text(' ', 'text!'))
Tag('strong', 'Multi ', Tag('em', 'part'), ' text!')

Similar objects are merged together:

>>> Text('Multi', Tag('em', 'part'), Text(Tag('em', ' ', 'text!')))
Text('Multi', Tag('em', 'part text!'))
>>> Text('Please ', HRef('/', 'click'), HRef('/', ' here'), '.')
Text('Please ', HRef('/', 'click here'), '.')
__eq__(other)

Rich text objects support equality comparison:

>>> Text('Cat') == Text('cat')
False
>>> Text('Cat') == Text('Cat')
True
__len__()

len(text) returns the number of characters in the text, ignoring the markup:

>>> len(Text('Long cat'))
8
>>> len(Text(Tag('em', 'Long'), ' cat'))
8
>>> len(Text(HRef('http://example.com/', 'Long'), ' cat'))
8
__contains__(item)

value in text returns True if any part of the text contains the substring value:

>>> 'Long cat' in Text('Long cat!')
True

Substrings splitted across multiple text parts are not matched:

>>> 'Long cat' in Text(Tag('em', 'Long'), 'cat!')
False
__getitem__(key)

Slicing and extracting characters works like with regular strings, formatting is preserved.

>>> Text('Longcat is ', Tag('em', 'looooooong!'))[:15]
Text('Longcat is ', Tag('em', 'looo'))
>>> Text('Longcat is ', Tag('em', 'looooooong!'))[-1]
Text(Tag('em', '!'))
__add__(other)

Concatenate this Text with another Text or string.

>>> Text('Longcat is ') + Tag('em', 'long')
Text('Longcat is ', Tag('em', 'long'))
add_period(period='.')

Add a period to the end of text, if the last character is not “.”, “!” or “?”.

>>> text = Text("That's all, folks")
>>> print(six.text_type(text.add_period()))
That's all, folks.
>>> text = Text("That's all, folks!")
>>> print(six.text_type(text.add_period()))
That's all, folks!
append(text)

Append text to the end of this text.

For Tags, HRefs, etc. the appended text is placed inside the tag.

>>> text = Tag('strong', 'Chuck Norris')
>>> print((text +  ' wins!').render_as('html'))
<strong>Chuck Norris</strong> wins!
>>> print(text.append(' wins!').render_as('html'))
<strong>Chuck Norris wins!</strong>
capfirst()

Capitalize the first letter of the text.

>>> Text(Tag('em', 'long Cat')).capfirst()
Text(Tag('em', 'Long Cat'))
capitalize()

Capitalize the first letter of the text and lowercase the rest.

>>> Text(Tag('em', 'LONG CAT')).capitalize()
Text(Tag('em', 'Long cat'))
endswith(suffix)

Return True if the text ends with the given suffix.

>>> Text('Longcat!').endswith('cat!')
True

Suffixes split across multiple parts are not matched:

>>> Text('Long', Tag('em', 'cat'), '!').endswith('cat!')
False
isalpha()

Return True if all characters in the string are alphabetic and there is at least one character, False otherwise.

join(parts)

Join a list using this text (like string.join)

>>> letters = ['a', 'b', 'c']
>>> print(six.text_type(String('-').join(letters)))
a-b-c
>>> print(six.text_type(String('-').join(iter(letters))))
a-b-c
lower()

Convert rich text to lowercase.

>>> Text(Tag('em', 'Long cat')).lower()
Text(Tag('em', 'long cat'))
render(backend)

Render this Text into markup.

Parameters:backend – The formatting backend (an instance of pybtex.backends.BaseBackend).
render_as(backend_name)

Render this Text into markup. This is a wrapper method that loads a formatting backend plugin and calls Text.render().

>>> text = Text('Longcat is ', Tag('em', 'looooooong'), '!')
>>> print(text.render_as('html'))
Longcat is <em>looooooong</em>!
>>> print(text.render_as('latex'))
Longcat is \emph{looooooong}!
>>> print(text.render_as('text'))
Longcat is looooooong!
Parameters:backend_name – The name of the output backend (like "latex" or "html").
split(sep=None, keep_empty_parts=None)
>>> Text('a + b').split()
[Text('a'), Text('+'), Text('b')]
>>> Text('a, b').split(', ')
[Text('a'), Text('b')]
startswith(prefix)

Return True if the text starts with the given prefix.

>>> Text('Longcat!').startswith('Longcat')
True

Prefixes split across multiple parts are not matched:

>>> Text(Tag('em', 'Long'), 'cat!').startswith('Longcat')
False
upper()

Convert rich text to uppsercase.

>>> Text(Tag('em', 'Long cat')).upper()
Text(Tag('em', 'LONG CAT'))
class pybtex.richtext.String(*parts)

A String is a wrapper for a plain Python string.

>>> from pybtex.richtext import String
>>> print(String('Crime & Punishment').render_as('text'))
Crime & Punishment
>>> print(String('Crime & Punishment').render_as('html'))
Crime &amp; Punishment

String supports the same methods as Text.

class pybtex.richtext.Tag(name, *args)

A Tag represents something like an HTML tag or a LaTeX formatting command:

>>> from pybtex.richtext import Tag
>>> tag = Tag('em', 'The TeXbook')
>>> print(tag.render_as('html'))
<em>The TeXbook</em>
>>> print(tag.render_as('latex'))
\emph{The TeXbook}

Tag supports the same methods as Text.

class pybtex.richtext.HRef(url, *args)

A HRef represends a hyperlink:

>>> from pybtex.richtext import Tag
>>> href = HRef('http://ctan.org/', 'CTAN')
>>> print(href.render_as('html'))
<a href="http://ctan.org/">CTAN</a>
>>> print(href.render_as('latex'))
\href{http://ctan.org/}{CTAN}
>>> href = HRef(String('http://ctan.org/'), String('http://ctan.org/'))
>>> print(href.render_as('latex'))
\url{http://ctan.org/}

HRef supports the same methods as Text.

class pybtex.richtext.Protected(*args)

A Protected represents a “protected” piece of text.

  • Protected.lower(), Protected.upper(), Protected.capitalize(), and Protected.capitalize() are no-ops and just return the Protected object itself.
  • Protected.split() never splits the text. It always returns a one-element list containing the Protected object itself.
  • In LaTeX output, Protected is {surrounded by braces}. HTML and plain text backends just output the text as-is.
>>> from pybtex.richtext import Protected
>>> text = Protected('The CTAN archive')
>>> text.lower()
Protected('The CTAN archive')
>>> text.split()
[Protected('The CTAN archive')]
>>> print(text.render_as('latex'))
{The CTAN archive}
>>> print(text.render_as('html'))
<span class="bibtex-protected">The CTAN archive</span>

New in version 0.20.

class pybtex.richtext.Symbol(name)

A special symbol. This class is rarely used and may be removed in future versions.

Examples of special symbols are non-breaking spaces and dashes.

Symbol supports the same methods as Text.

Style API

A formatting style in Pybtex is a class inherited from pybtex.style.formatting.BaseStyle.

class pybtex.style.formatting.BaseStyle(label_style=None, name_style=None, sorting_style=None, abbreviate_names=False, min_crossrefs=2, **kwargs)

The base class for pythonic formatting styles.

format_bibliography(bib_data, citations=None)

Format bibliography entries with the given keys and return a FormattedBibliography object.

Parameters:

Pybtex loads the style class as a plugin, instantiates it with proper parameters and calls the format_bibliography() method that does the actual formatting job. The default implementation of format_bibliography() calls a format_<type>() method for each bibliography entry, where <type> is the entry type, in lowercase. For example, to format an entry of type book, the format_book() method is called. The method must return a Text object. Style classes are supposed to implement format_<type>() methods for all entry types they support. If a formatting method is not found for some entry, Pybtex complains about unsupported entry type.

An example minimalistic style:

from pybtex.style.formatting import BaseStyle
from pybtex.richtext import Text, Tag

class MyStyle(BaseStyle):
    def format_article(self, entry):
        return Text('Article ', Tag('em', entry.fields['title']))

Template language

Manually creating Text objects may be tedious. Pybtex has a small template language to simplify common formatting tasks, like joining words with spaces, adding commas and periods, or handling missing fields.

The template language is is not very documented for now, so you should look at the code in the pybtex.style.template module and the existing styles.

An example formatting style using template language:

from pybtex.style.formatting import BaseStyle, toplevel
from pybtex.style.template import field, join, optional

class MyStyle(BaseStyle):
    def format_article(self, entry):
        if entry.fields['volume']:
            volume_and_pages = join [field('volume'), optional [':', pages]]
        else:
            volume_and_pages = words ['pages', optional [pages]]
        template = toplevel [
            self.format_names('author'),
            sentence [field('title')],
            sentence [
                tag('emph') [field('journal')], volume_and_pages, date],
        ]
        return template.format_data(entry)