af83

How to extend the Redcarpet 2 Markdown library?

As you may now, this blog is powered by Jekyll, and the posts on it are written in Markdown (by the way, you should adopt the Markdown Mindset). But by default, the markdown parser for Jekyll doesn't understand the fenced code blocks that Github Flavored Markdown introduced. And as a daily user of Github, it was something that I really wanted.

So I wrote a plugin for Jekyll based on the redcarpet library which adds this feature and some others. Redcarpet is a very nice library for parsing Markdown and rendering HTML from it. Since the version 2 Vicent Martí, its author, has decoupled the parser and the renderer, making it fairly easy to extend.

This post explains how to do that and sidestep some traps. But, first, let's see the basic usage of Redcarpet. It works in two steps:

  1. Create a markdown object, with a given renderer.
  2. Give it a markdown text with the render method and it will return you an HTML document.

For example, if you want to use the Redcarpet::Render::HTML renderer (it's one of the renderers that came out of the box with Redcarpet), you can do:

markdown = Redcarpet::Markdown.new(Redcarpet::Render::HTML, :fenced_code_blocks => true)
markdown.render("This is *bongos*, indeed.")
# => "<p>This is <em>bongos</em>, indeed</p>"

The first level of personalization of Redcarpet is the extensions that you can give to the Redcarpet::Markdown constructor. In the previous example, we enabled the :fenced_code_blocks extension that makes it parse blocks of code delimited with 3+ ~ or backticks. These extensions modify the way Redcarpet parses the markdown input.

For the output, you can change the first argument: the renderer. In our example, we used a class, but you can also give an instance instead. And this is particullary useful, as Redcarpet::Render::HTML accepts options for its constructor.

Let's see an example:

renderer = Redcarpet::Render::HTML.new(:no_links => true, :hard_wrap => true)
markdown = Redcarpet::Markdown.new(renderer)
markdown.render("This is foo.\nThat is bar.")
# => "<p>This is foo.<br>\nThat is bar.</p>\n"

One of the available options, :with_toc_data, is not that obvious to use. If you want to generate an HTML document with its table of contents, you'll have to render your document 2 times:

html_toc = Redcarpet::Markdown.new(Redcarpet::Render::HTML_TOC)
markdown = Redcarpet::Markdown.new(Redcarpet::Render::HTML.new(:with_toc_data => true))
toc  = html_toc.render(text)
html = markdown.render(text)
full = toc + html

So, how does it work? The first time, we use a special renderer, Redcarpet::Render::HTML_TOC, which renders only the table of contents. Then, we do a second pass to generate the body of the document with anchors on the titles (this is what the :with_toc_data option does).

You can go further with a renderer tailored for your fancy needs. It's not complicated, all you have to do is inheriting from Redcarpet::Render::Base and implementing the callbacks listed on the README. An example of this is given to render Manpages.

However, there is a faster way to render HTML: you can inherit from Redcarpet::Render::HTML and just overload some callbacks! The canonical example is highlighting code:

class HTMLwithAlbino < Redcarpet::Render::HTML
  def block_code(code, language)
    Albino.safe_colorize(code, language)
  end
end

markdown = Redcarpet::Markdown.new(HTMLwithAlbino, :fenced_code_blocks => true)

But let's try another example: you want to use the headers from <h2>, not <h1>. So let's do with our own header method:

class OurHTML < Redcarpet::Render::HTML
  def header(text, level)
    level += 1
    "<h#{level}>#{text}</h#{level}>"
  end
end

Our implementation doesn't seem to be optimal. You may be tempted to use super like this:

class OurHTML < Redcarpet::Render::HTML  # WON'T WORK
  def header(text, level)                # WON'T WORK
    super(text, level + 1)               # WON'T WORK
  end                                    # WON'T WORK
end                                      # WON'T WORK

Please be warned that you can't do that. To offer the maximum performances, the readcarpet library is coded mostly in C with some optimizations. One of these optimization is the way the Redcarpet::Render::HTML methods are called from the library and it has the nasty side-effects that super won't work. An issue is opened if you want to discuss about that.

Still, it's possible to use super on the constructor, for enabling an option by default - for example:

class HardWrappedHTML < Redcarpet::Render::HTML
  def initialize(options={})
    super options.merge(:hard_wrap => true)
  end
end

To conclude, Redcarpet is my favourite library for manipulating Markdown: it offers some options for common customizations and can be extended easily.

blog comments powered by Disqus