A static React site on S3 with /directory-style/ URIs

May 2018

How (and why) to build a static site with React

I wanted to put together a new site recently, as I'm thinking about looking for a job soon. This was a bit of a treat - a personal site is always an excuse to mess around with the latest Gee Whiz toolsets without worrying about whether they fit a client's needs, or whether they actually make any kind of sense.

My requirements were:

  1. Directly editable. No messing around with themes or templates.
  2. Markdown post support.
  3. Static site. No webserver needed.
  4. Instantaneous navigation between pages. No reloading.
  5. No ugly "/foo/index.html" URIs that document stores like S3 usually require.

Tools

I checked the calendar and it's 2018, so obviously I decided to use React.

Why not Ghost, the excellent blogging platform? It's not static. I didn't want to be running a persistent server process just to deliver dumb unchanging content1.

Why not Jekyll, or Hugo, the excellent static blog generators? I wanted the freedom to write my own pages from scratch (like a portfolio page) rather than mess around with passing themes through an engine.

These would work if I was just delivering pages, but I wanted instantaneous transitions between pages, which means JS. And going the SPA route of forcing a user to download a big blob of JS, in order to download some data, in order to render a page in their browser is even slower and more wasteful than running their request through a serverside process.

The solution was a well-maintained and highly customizable build tool for React called React Static. It allows me to run a build command locally and upload the resulting assets to S3, so the initial document that hits the user's browser is already the prerendered HTML page they requested. React (and the data React will use to hydrate the other pages to which they may want to navigate) are downloaded in the background.

With the addition of a markdown-to-JSON library called jdown, that took care of requirements 1 through 4.

Paths & URIs

Unfortunately, URIs were still ugly. React Static seems to have been designed to be delivered by an Apache-style server, where the URI /posts/ would be written to access the file /posts/index.html through a module like mod_dir. It generates an output /dist/ folder with this kind of structure that isn't going to work on a system not running Apache/nginx - like any document store. The project actually recommends S3 as a hosting option (behind Netlify, which also looks nice), but ignores this problem.

Even if React Static did something more S3-friendly with the directory - generating /dist/posts.html rather than /dist/posts/index.html, for instance - that wouldn't really serve our purposes. S3 allows exactly two request destination rewrites for a bucket: one of them for the root bucket URL, and the other for errors. Using the default behavior would mean that users arriving directly at https://bonner.jp/work - or refreshing, or using the browser's back button - would not get the prerendered page they wanted.

Instead, their browser would show them the HTML for the index page (or error page, depending on how S3 was configured), and after React finished downloading and checking the URI, the desired page would be rendered. Unacceptable!

So customizing React Static was not a solution. Finding another storage solution that supported this structure might be, but I'm not aware of anything as cheap and powerful as S3/Cloudfront.

Luckily, AWS recently released a service called Lambda@Edge that allows you to execute arbitrary code at the cache rather than waiting for it to hit the server, and modify the request object accordingly.

So once the S3 content is behind a Cloudfront CDN, it's a simple step to write a regex that will append /index.html onto an inbound page request, discarding a trailing slash if it's present, and ignoring assets.

A hastily written regex to exclude asset requests:

/^((?!\.(css|js|html|xml|txt|ico|jpg|png|gif|svg|json|pdf)$).)*$/

And a quick example of it in a JS script running on Lambda@Edge.

Now a user's request to Cloudfront for e.g. https://bonner.jp/work will grab the asset in S3 at bonner.jp/work/index.html - returning the "Work" page, after which the JS needed to render everything else2 downloads in the background. Voila.

If you'd like to see how it turned out, well, click around. The source code is here on github. Thanks for reading.

Footnotes

  1. My previous site used obtvse, a (hilarious) parody/clone of the svbtle blog network. I enjoyed the admin/postwriting tool, but there was no need to run a Rails app with an accompanying database just to present text that wouldn't change between requests. These days I compose everything in a text editor anyway.
  2. Everything else linked to from that page, anyway.