Making a static site generator

You're using it now


2025-06-19 | ↩️ 18🔄 4 17

I had a burst of inspiration after my last post became popular on lobste.rs and decided I’d rather like to own a bit more of the technology I use. The next tool in my sights was this blog itself.

The existing setup

For the past few years, I’ve hosted my blog on Netlify using the Hugo static site generator. I have no real complaints about this: Netlify is easy to use, I’ve experienced exactly no downtime, and, crucially, the UI has remained consistent throughout that time, which is more than can be said for a great deal of other services.

Hugo too is without issue: it does exactly what I need it to do, with relatively little configuration. If you’re looking to host a blog, are vaguely technically-minded (are happy writing Markdown, using git, etc.) but have no interest in managing web technology, I’d strongly recommend this setup.

But, I’m on a mission to reduce my dependence on third-party services and I love building my own tools, so it’s time to move on.

The HTTP server

I’ve used the Tokio project’s axum before and have found its API to be terse and intuitive. So, this was my starting point. I’m going to be hosting this setup in a containerised environment behind a reverse proxy so hopefully I’ve mostly got security covered (even if an attacker managed to gain arbitrary control of the server binary, there shouldn’t be much they can do from within the environment other than respond to HTTP requests). That said, I’d rather be safe than sorry: and I know there are enough eyes staring at axum that it’s unlikely to contain major security holes.

Parsing markdown

I want to write blog posts in markdown, not raw HTML. I’m not too bothered about features, but I’d like to have the basics: images, tables, code blocks, support for arbitrary HTML within a post. I started off with the aptly named markdown crate - it seemed to do the job, and I was happy.

But then I decided I’d quite like to be able to place a toml code block at the top of each document to serve as the blog post frontmatter (for metadata like post title, description, publish date, etc.) and that this would require me manipulating the parsed markdown document before conversion to HTML, but I found this to be slightly annoying to do.

So, I switched crates: first to markdown_ppp, then to linemd. Each of them had limitations that made them unsuitable: unsupported markdown features, poor support for manipulation of the markdown AST, etc.

I even briefly considered writing my own markdown parser. However, I spend enough of my life writing and debugging parsers already, and didn’t much fancy spending my evening wrangling such a thing together.

Finally, I discovered pulldown-cmark. It was a perfect fit: feature-rich, performant, and with the ability to intercept parser ‘events’ as I pleased. I set the code up to search through my content directory for .md files at startup, extract the frontmatter, parse it as TOML using serde_toml, and then store the remaining markdown as HTML. When a request came in for a blog post, axum would dutifully serve it to the client.

pulldown-cmark fully supports CommonMark, so all of the usual things work just fine:

Thisis
atable

It even supports arbitrary HTML

HTML rendering

I wanted to add a little sparkle to the blog posts. While I’m a big fan of asceticism, black-text-on-white-background was a little uninviting. The Indie Web is supposed to be personal and unique, so I wanted some styling and structure to my posts.

I searched around for solutions. I’m still pretty new to backend web development, but it seems like most folk opt for a HTML templating language nowadays. Many in the Rust ecosystem are compile-time templating engines. This means you end up with faster page generation and errors before you deploy your website, but I didn’t much like the ideas of recompiling the site generator every time I wanted to tweak the HTML. So, I opted for tera, which seems to have a relatively competent and simple templating language.

I wrote up some simple HTML templates for the blog page and generic header/footer sections, then gave it a spin. Success! I would have preferred slightly better docs from tera (more explicit information for each built-in construct rather than a handful of examples), but I have been spoiled by the shockingly high bar for docs in the Rust ecosystem, so perhaps that’s on me.

HTML and CSS

Now for the fun part: styling and layout. My day job is in embedded software development: I’m more at home with a disassembler than a browser inspector. But I figured that by rejecting the ‘collective wisdom’ of the masses and avoiding JS frameworks like the plague, it might not take me too long to get something nice up and running.

And so, I opened altogether too many w3schools tabs and got to work relearning HTML and CSS. It’s easily been a decade since I’ve done anything like this, but I must say - it was a joy to re-learn. Working with core web technologies is just downright fun, and one no longer needs to check every feature for browser support nowadays.

I went for a simple layout: a flexbox used to define two columns (one for the main content, the other for the site menu) and made sure that the menu reflowed to the top of the page when viewed on a smaller screen to avoid squashing the content to one side.

I’m definitely no designer, but I’m very happy with the result. I’m probably committing a lot of design sins, but hopefully by sticking to core web technologies the result is at least mostly not an accessibility nightmare. If you notice issues, please let me know!

I made some fairly trivial use of tera to loop through the blog entries and generate a list for the home page - perhaps if I do enough writing I’ll eventually need to look into pagination, but for now it’ll do just fine.

Finally, I added some basic theme-dependent styling such that pages respect both dark and light themes.

Syntax highlighting

Unfortunately (and, frankly, quite reasonably) <code> blocks don’t natively support syntax highlighting. I talk about code a lot, so syntax highlighting was a necessary feature. I initially dropped HighlightJS in, and it did the job perfectly, but I wasn’t happy. I’m not ideologically opposed to JavaScript, but I wanted to at least give avoiding it entirely a go.

And so, I used the excellent tree_sitter parsing library and the tree_sitter_rust crate to generate syntax highlighting events server-side during blog post generation. For each token event, I encase the sub-string belonging to that token in its own element, with a CSS class corresponding to the kind of token. Then, I simplify define the highlighting theme in the CSS - couldn’t be easier!

fn main() {
    println!("This snippet has been highlighted by the server, not your browser");
    println!("The colour scheme could do with some work, however");
}

In the future I’d like to support more languages, but thankfully the tree_sitter ecosystem has me covered - every language under the sun is supported, and those that aren’t are trivial to create integrations for.

There we have it: 100% server-side syntax highlighting, not a line of JS in sight.

Mastodon integration

I wanted support for blog post comments. I’d rather not rely on a third-party corporation’s comment service or, worse still, build my own auth system purely to allow users to comment on posts. Instead, I decided to try integration with Mastodon.

The idea is simple: if I decide I like a blog post enough to show the world, I post about it: then I edit the frontmatter of the blog post to include the ID of the Mastodon post. From there, the site generator will request a list of responses to the post periodically (I’d like to make it push-based eventually, but polling every 5 minutes should do for now - I donate to my instance’s fund every month, so I don’t feel so bad about doing this).

I didn’t much feel like building my own client for the Mastodon API, so I used megalodon, which claims to support a swathe of Fediverse protocols. Getting it up and running was simple: I created an access token on my instance, threw it in a configuration file (locally, and on my VPS - but not checked into git, of course) and then had the site generator read the config file.

From there, the process is simple: megalodon provides a get_status_context method, and this provides literally everything required to extract comment data. Then, I just render the comment data using tera, as with my contents page.

I would add a screenshot, but you can just scroll to the bottom of this page.

Deployment

I already run a VPS for some personal tools (being able to access my personal todo list everywhere is, it transpires, immensely useful) so I threw it on there too. A simple Dockerfile to build the image using the official Rust images, a docker-compose.yml to specify the container configuration, and a new reverse proxy entry to bind it to the blog subdomain. Maybe a year ago this would have taken me a day to figure out, but I’ve played around with docker enough over the past few months for this not to be too much of a challenge.

I have reasonable limits placed on the VPS traffic, and it costs me only pennies per week, so it won’t be draining my bank account any time soon. But still: don’t DoS it, please?

Things to improve

There are a few things I’d like to improve (or will be force to) over time though:

  • Publishing a blog post currently requires me to log in to the VPS, manually pull the blog git repository, and then restart the container. I don’t write enough posts for this to be a problem, but it would be nice to have it do this automatically.

  • Several parts of the site are heavily oriented toward my own needs. Hard-coded domains, social links, etc. I could factor these out, and if I was building a professional tool I would - but the beauty of building tools for yourself is that you only need to care about your own requirements, not those of customers or users.

That’s that, I suppose

It all works. Surprisingly well. The final project sits almost entirely within a single main.rs, and comes to about 400 lines. With a little clean-up time it would probably be closer to 300.

It’s definitely very custom and there are parts of it that could do with closer attention, but I’m confident that it’s got sufficient rigidity to be exposed to the open web. At least, I’ve got a more confidence in it than some of the stuff I’ve seen certain companies deploy into the willing arms of thousands of customers, so I’m happy.

What did I learn?

A lot! Much more than I would have if I’d leaned on an LLM whenever I encountered an obstacle. Yes, it’s time for the part where I try to convince you to stop outsourcing your brain to the technology brothers.

Maybe you’re a dyed-in-the-wool web developer and you could do all of this in your sleep. If that’s the case, then well done - but I’m not, and yet this all came together in a single 4-hour programming session.

For me, the lesson here is that what the software world needs is better tooling and better languages, not Jesus-take-the-wheel language models. The ease and success of this project is entirely down to the grace and quality of the Rust ecosystem, and the tooling that glues it all together. The libraries, the documentation, the type system, the examples. We should not underestimate the extent to which those things provide velocity as well as assurance, and I truly believe our focus should be on planting enough trees that we can start to see the woods instead of setting fire to the whole forest because it’s in the way of our shiny new road.

Crucially, I understand what I have built. If it breaks, I will know why it has broken. If it has a security flaw, I will have the knowledge needed to fix it. If I want a new feature, I can just implement it. The simple act of fighting with the problem, trying different approaches, and incrementally puzzling my way to a solution was in equal parts fun and educational. I found myself grinning from ear-to-ear on a regular basis as I watched it come together, and I wouldn’t trade that experience for the world - no matter how much Sam Altman wants me to.