Highlighting Code Using Pygments and Beautiful Soup

November 12, 2006

Syntax highlighting in blog posts is something that has always bugged me. I don’t like JavaScript-based solutions so I wrote a quick&dirty function that highlights Python-code in my blog posts on the server side. Following examples are written for Django, but they should work on any Python software.

The problem

I want to use Markdown and still be able to have automatic syntax highlighting for Python code that’s inline in my blog posts. Markdown alone tends to break HTML-formatted source code (because of indentations, etc) so fully working solution needs a bit tweaking.

The Solution

We’ll need:

With these tools we’re able to build a helper function that looks for source-code in a given text, highlights it’s syntax and applies Markdown filtering to it without messing up the syntax highlighted code.

The Code

My (simplified) BlogEntry model looks like this:

class BlogEntry(models.Model):
    title = models.CharField(maxlength=500)
    body = models.TextField(
        help_text='Use <a href="http://daringfireball.net/projects/markdown/syntax">Markdown-syntax</a>')
    body_html = models.TextField(blank=True, null=True)
    pub_date = models.DateTimeField(default = datetime.datetime.now)
    use_markdown = models.BooleanField(default=True)

    class Admin:
        fields = (
            (None, {
                'fields' : ('title', 'body', 'pub_date', 'use_markdown')

Redundant body_html element is for performance: instead of calculating markdown- and syntax highlight for the body on every request, we calculate it only on every save. (Yes, it could also be done on the body-field itself, but I prefer that the content I’m editing does not change every time I save it.)

Next the highlighting function:

def _highlight_python_code(self):
        from pygments import highlight
        from pygments.lexers import PythonLexer
        from pygments.formatters import HtmlFormatter
        from unessanet.misc.BeautifulSoup import BeautifulSoup

        soup = BeautifulSoup(self.body)
        python_code = soup.findAll("code", "python")

        if self.use_markdown:
            import markdown

            index = 0
            for code in python_code:
                code.replaceWith('<p class="python_mark">mark %i</p>' % index)
                index = index+1

            markdowned = markdown.markdown(str(soup))
            soup = BeautifulSoup(markdowned)
            markdowned_code = soup.findAll("p", "python_mark")

            index = 0
            for code in markdowned_code:
                code.replaceWith(highlight(python_code[index].renderContents(), PythonLexer(), HtmlFormatter()))
                index = index+1
            for code in python_code:
                code.replaceWith(highlight(code.string, PythonLexer(), HtmlFormatter()))            

        return str(soup)

This function searches <code>-blocks that have class="python" attribute. It first replaces them with placeholder text, then applies markdown if necessary, and finally replaces the placeholders with syntax highlighted code. It may not be the most beautiful code, but it works :)

And finally the save method:

def save(self):
    self.body_html = self._highlight_python_code()

The body_html-field is updated on every save. On the template side you can use simply {{ entry.body_html }} without applying any additional filters.

The CSS needed for syntax coloring can pe printed out with Pygments for example like this: css = HtmlFormatter().get_style_defs('.highlight'). It may be wise to save the code and put it in a static CSS-file.

Known Limitations

  • Not a bug, but feature, every instance of code-tags that have class="python" will be replaced. This was a bit annoying when trying to document this particular function…
  • Unicode strings break the highlighter. Any help on this is appreciated!

This code is published under Creative Commons License. Please share any comments! :)

Tagged with , , , ,
Proudly Powered by Django
Unessa.net © 2000-2017 Ville Säävuori. All lefts reversed.