website
website copied to clipboard
Add API to search index
API part is currently not indexed.
I currently don't see a way to index them properly with the way how API pages are structured (multiple headings as H1, title containing not only name but also type). When migrating to Statiq we should make sure that API docs can be indexed.
I'm hesitant to omit the <wbr>
and other elements that help display headings, etc. with good wrapping upstream. And given the size of the Cake API, a client-side search may not be performant enough (though I'd be curious to find out - the client-side search in Statiq is totally rewritten and uses gzip and other strategies to make it small and fast, even for big indexes).
Ignoring the client-side option for the moment, I do have some other thoughts:
- We could generate a search index that Algolia can recognize and upload it directly on deployment (as opposed to or in addition to the crawler). This is what Discover .NET does with Algolia and it works well.
- We could add a hidden element or something similar that the crawler can see but that the user won't (footer text same as background, etc.).
- If neither of those work, we could make the title display in the upstream theme overridable (it may already be) and Cake could adjust them as needed locally.
As the porting starts (like, now! yay!), I'll keep an eye on this one in relation to the Algolia search.
@daveaglick I think you suggestions make it more complicated than it has to be.
The <wbr>
is not the issue here, as it is already handled by the crawler. We currently are using Algolia DocSearch, which works by them hosting the index and running a crawler.
The only thing we need to do is to configure how our page is structured through CSS selectors (see https://github.com/algolia/docsearch-configs/blob/master/configs/cakebuild.json). As soon as we can define selectors for the different levels of a document it will work.
The question is what we want to index for an API page. What currently already should be possible is something like this:
Level | Selector | Example result |
---|---|---|
lvl1 | .content-header h1 |
MyClass class |
AH, I think I've got you. It's more that the headings aren't semantic (multiple H1, etc.)? Yeah, that should already be fixed I think. And if not, then it's worth doing upstream since the API pages should be good semantic HTML anyway.
Update: I took a look at the pages in https://www.statiq.dev/api and we might have a little work to do here. Easiest might be to apply marker CSS classes that the selector could use to identify particular bits for indexing.