masochist Replace Relay with minimal GraphQL client

Milestones

Done so far...

[x] Lex and parse all the GraphQL documents currently in Masochist.
[x] Lex and parse the GraphQL schema definition currently in Masochist.

... next steps aren't as clear... perhaps:

[ ] Decide on an abstraction layer between the schema and the raw data. In the current implementation, the schema and the data loading and transformation are all smooshed together. I suspect it might be nice to abstract over the main entities in a GraphQL agnostic way, and just provide methods for loading one, loading a batch (this includes paginating), and searching. I might then be able to generate the schema executor from the .schema artifact + some conventions (or directives) telling it how to call the loaders.
- [x] In keeping with my near-fanatical no/low-deps philosophy, start by writing a replacement for the memcached library I'm using that hasn't been updated in 7 years.
- [x] Port Redis client.
- [ ] Maybe even consider something dramatic, like writing the schema endpoint in... Rust, which would allow me to skip some JS dependencies like express, and build a ridiculously fast backend.
  - [ ] Make Markdown translator in Rust.
  - [ ] Make GraphQL schema in Rust.
  - [ ] Make thin JS layer for doing server-side React rendering; or...
  - [ ] Screw JS make Rust dump out the static HTML as well...
[ ] Once we can execute a query, make a "blinking light demo" that runs a query and renders a document (maybe without React, as I just want to dump static HTML to disk at this point).
[ ] Render static HTML detail pages (in JS, if I don't do it in Rust).
[ ] Write a custom Markdown renderer (in JS, if I don't do it in Rust) — depends on how deep I want to pursue the "minimal dependencies" religion.
[ ] Profit?

The context

This supersedes https://github.com/wincent/masochist/issues/140, which is about upgrading Relay.

That ticket is about moving from v1.7.0 (which I have been on for years now) to v13.0.0 (which will be out soon). Now, there have been lots of great improvements made in that interval, but also breaking changes which would be quite painful for me to have to accommodate. When you look at what I actually need for my incredibly static, unsophisticated website, it's clear that I am only using about 5% of what Relay has to offer, and that I can probably obtain a better result by building the thinnest of GraphQL data-fetching layers and saving myself from downloading a bunch of JS that I will never need.

Specifically, if you look at the queries we run and the types we have defined in our schema, we have:

Four basic "Node" types for content (Article, Page, Post, Snippet).
Three connections for paginating through the main three content types on their index pages (ArticlesIndex, PostsIndex, and SnippetsIndex), plus some special connections for search, tags, and "taggables" (ie. items with a particular tag); note that none of these connections use any advanced connection features that you would need for pull-to-refresh, or extending connections bidirectionally, or filling in gaps etc — we just have a simple "Load more" button that pulls in 10 more items after a given cursor and that's it.
Tags are "Node" types too, but for the most part the taggable records generally deal with tag names instead (ie. they just have simple lists of tag names, not connections).
In terms of navigation patterns, the dominant flow tends to be from index page to permalink; and by the time we load the permalink, we already have the data because we pre-fetched it as soon as you moused over the link.
There are no fragment variables, no deferred queries, no custom handles, no directives (other than the already-mentioned @connection ones).

So, overall, our data is pretty shallow, with no real nesting of refetchable nodes within nodes. That is, we don't even really need a normalized store — a store of unflattened node data would actually be almost enough to handle the common cases (technically, there are places where we do a partial fetch of a node — such as on the wiki index page, or the search results, or a tag listing — and then need to fetch more data in order to navigate to the permalink/detail view, but these hardly warrant a fully normalized store). Really the only "tricky" requirement we have is to deal with connections, and as already stated, the connections we're using in this app are the simplest possible kind. There are no mutations, no GraphQL subscriptions, no local schema, no need for integration with React Suspense, no need for any kind of configurability, and we don't need garbage collection. We can live without inline GraphQL fragments, using colocated .graphql files instead, meaning we don't need a Babel plug-in.

Given all this, I think we could do everything we need with a few kilobytes of JS, and with something so minimal, we'd be in a good place to do the silly experiments that I want to try out with streaming responses (although the payoff is expected to be tiny, given the lack of size/complexity in the payloads we're dealing with, so I might not even do that).

What we need to have parity with existing solution

Server rendering (streaming).
Persisted queries.

What extras we can get with a new solution

Massively reduced dependency footprint and smaller client code size.
Dedicated ResponseBuilder classes that can be code-gen'd ahead of time for each persisted query (also need to think about how I would build a very simple version of the schema if I weren't using the GraphQL reference library)[^jit]; other examples of "shifting left" here include:
- Instead of persisting query text, persist a parsed AST.
- Execute queries ahead of time producing static JSON responses that can be served directly from nginx.
- One step beyond JSON, actually render HTML and serve it up statically (ie. almost entirely static site).
Streaming JSON GraphQL responses (eg. bottom-up line-wise flushing of response graphs), which can be processed as they're flushed over the wire to the client.

Closes: #140

[^jit]: Just saw on HN a related idea in the form of zalando-incubator/graphql-jit, which generates query functions that can be run instead of using the standard generic GraphQL execution implementation in graphql-js.

Dec 13 '21 23:12 wincent

I should add that lately I've even been tempted to replace all of this with a static site generator, as the only part of the site that actually has to be dynamic is the search functionality, and I think I can find enough interesting subproblems on the static generation side to make it scratch the itch that motivated me to build this stuff in the first place — that is, I can still use my fancy GraphQL tooling to drive the generation, and the schema in its current form (ie. the connections etc) is still probably useful as a data source. As much fun as it would be to play with streaming responses and so on, it would be hard to beat the performance of static HTML with probably less than 1KB of JS to handle the dynamic bits...

I have never been super fond of the "load more" pattern used in the index pages. The blog index is probably just as best served by serving the current article and then having next/previous links to jump between articles. Likewise for the other indices, like the wiki index, I might just be able to generate a big long static list (which in turn would enable some nice type-to-filter UI). At the time of writing, I have about:

908 blog posts.
4739 snippets.
2451 wiki articles (something like 379 of these are redirects).

Of that list, only the snippets list is probably unpleasantly large to fetch and render. Anyway, something to think about...

The blog obviously needs a reverse chronologically ordered listing (by creation date), but the truth is the wiki's list (by updated at) is rarely interesting beyond the first page or so. An alphabetical list might be useful as an alternative view.

Mar 27 '22 21:03 wincent

Just wanted to note that I'm still toying with the idea of generating static HTML, although also weighing that up against other things like using React Server Components (see RFC).

I care about three things:

Initial page load time.
Navigation time.
Deploy time.

I'll weigh any potential architecture against those three before proceeding. Some example thoughts:

Nothing is going to beat pre-rendered HTML for initial page load time; but:
For navigation, I think React on the client (ie. issue a persisted GraphQL query and get a minimal JSON payload back, then update the DOM) is still going to beat loading static HTML (because the JSON will be smaller than the HTML and JS is fast). (Counterpoint: The HTML would still be pretty fast, could benefit from browser caches and long cache times, and things like the back button, scroll position restoration etc would just work "for free".)
I previously talked about using generated response builders on the server side, but I could go so far as to pre-rendering JSON responses to entirely cut JS out of the loop (ie. nginx could serve them straight up!).
But I don't want slow deploys, so that means I don't want to generate thousands of artifacts on every push, unless I can make it really fast. The alternative is to generate them via build step, then commit the results as CIGARs (Checked-In Generated Artifacts).
While I said that "nothing is going to beat pre-rendered HTML for initial page load time", JS is fast and I expect that React Server Components could provide an interesting alternative given that most of my content isn't dynamic in nature. I'd only need a small amount of client-side dynamism.
If I do go with pre-rendered stuff, I need to decide what I am going to do about pagination. Currently I'm using ("infinite") "Load more". I could maintain this, or switch to prev/next navigation (eg. for the blog and snippets), or just go with MovableType-style archive pages. Not really sure what I want to do there.
I also don't want to forget that I was initially toying with the idea of not even using React (although I didn't explicitly spell that out here), in the name of having a zero dependency footprint in production (this is what I was alluding to when I said "static HTML with probably less than 1KB of JS to handle the dynamic bits" above).

Aug 05 '23 09:08 wincent

Going to spitball some ideas here about what to do with my schema artifact. I've been thinking about this for a few days now (not actively, but wondering if my subconscious mind will deliver an answer to me after I wake up). Let's compare some approaches and examples of libraries and tools that embody those approaches:

graphql-js: You use OO classes to define your schema and resolvers. Example from the current main branch. The package allows you to execute queries against the schema, including an introspection query, and it also provides tools for printing out the schema in GraphQL's schema definition language.
apollo-server: I've never used this, but the docs suggest that you write out the schema by hand, and then you write your resolvers. In some simple cases, you can get away without defining a resolver and just rely on the default resolver's behavior, which will look at the corresponding property on the parent object of any given field and try to produce a value from it (by calling it, if it is a function, or otherwise returning it).
Mercurius is similar to Apollo in that you provide resolvers.
graphql-yoga: Again, like Apollo, you write your schema by hand and also provide resolvers.
graphql-code-generator (docs): I believe this will take a schema file and generate TypeScript types for you. I think you still have to write your own resolvers (although I may be missing some facility for generating them, to an extent), but at least they'll be typed.

Now, I don't want to go the graphql-js path — neither by consuming it as a library (I would not have written the lexer/parser unless I'd wanted to get rid of the dependency on graphql-js) nor by copying the pattern (whatever code I end up having to write to provide resolvers, I want it to be close to a POJO as possible, because one of the motivations for doing this rewrite in the first place is to provide me with a vehicle for trying out novel execution strategies[^alluded] — class-based hierarchies tend to make that kind of iteration and experimentation harder rather than easier). But most of the other solutions appear to barely leverage the schema, and leave opportunities for code generation on the table (indeed, if some of them offer any type safety features, they don't make it clear, which is a shame given one of the main selling points of GraphQL is that it's a type system which lends itself to static verification and generation). graphql-code-generator seems to be the exception.

It goes without saying that I want the resolvers to be 100% typesafe. But maybe less obviously (although mentioned elsewhere), I think that I might be able to make the resolver layer a particularly thin one by instead focusing on building underlying abstractions for loading objects and collections of objects. By a combination of naming conventions and directives, I might be able to wire up the resolvers to the low-level data fetching code without any code at all (or 1 layer of indirection to make things static enough for the TypeScript compiler to verify).

Now, you could also make that lower-level layer and use it usefully with any of the other solutions on the list, so there's not really any significant risk with making a start on building that layer as a first step. I think the end destination ends up being something that looks a bit like graphql-code-generator (and of course, I'm not going to use that but will instead roll my own, as GOD intended).

[^alluded]: Alluded to in a number of places, but including things like streaming responses, generated resolvers, custom endpoints for specific persisted queries etc.

Apr 23 '24 17:04 wincent