wssg, the Worst Static Site Generator

The Journey

The first version of this website was handcrafted HTML, each character lovingly typed. This was incredibly time consuming, so I switched to using pandoc to generate the HTML files. While allowing me to write in markdown, it was still pretty cumbersome. Since the navbar content resided in a distinct header.html file, I needed to update that file, then run pandoc on each markdown file again. I suppose I could have created a bash script to build the website this way, but bash is a scary language. Lastly, I explored using Hugo or Jekyll as a substitute. No theme matched what I was looking for exactly. I figured, instead of learning how to build my own Hugo theme or mod someone elses, why don't I just write my own static website generator and get some Python practice while I'm at it?

This is why you don't do it, past Joshua.

wssg

wssg, pronounced about how you expect it, is my ongoing effort to write a static site generator. The first version, which took a few days of tinkering here and there, uses RegEx to split lines of markdown text into clumps of special characters (such as headers using #, lists using -, etc.), then iterates through the list of strings with some half-baked state machine. It really is a "write code and ask yourself if it's a good idea later" attempt.

It does work, kind of. Bold text, italics, headers, hyperlinks, images, lists; they're supported. In fact, this website was generated using wssg! That said, not everything works. Most doesn't. Hyperlinks with dashes will break because I hard-coded the number of "clumps" from beginning to end of a link. There's no nesting markdown syntax. A lot of basic syntax still isn't supported; blockquotes, LaTeX, etc. I figure a full rewrite of the markdown-to-html function using a proper state machine or abstract-syntax tree and character-by-character parsing is the most important thing to do, so that other changes don't end up needing to be re-written immediately after re-writing the markdown parser.

Update 2023-05-18

Having learned the potentially cosmic consequences of my actions (markdown also isn't a regular language), wssg has been rewritten to only use RegEx where more appropriate. And for fun.

The bad news: wssg is now twice as slow, mainly due to parsing lines character-by-character instead of using RegEx to split them up into chunks and iterating though these.

snakeviz of cProfile output when running wssg on this website, showing a 2x slowdown when parsing lines manually instead of with RegEx

The good news: wssg now supports blockquotes, escaping special characters, and links with markdown characters won't break anymore (I think)! Also, the slowdown is negligible for a website of my size. Maybe if it becomes an issue later, I'll re-write the line parser in a faster language or a smarter way.