ibis
ibis copied to clipboard
docs: improve searchibility with algolia
This is a work in progress, not working yet we have an issue with the records size.
I was trying to run the .github/workflows/upload-algolia.py
locally and run into this kind of error
algoliasearch.exceptions.RequestException: Record at the position 459 objectID=reference/expression
tables.html#methods is too big size=224921/10000 bytes. Please have a look at https://www.algolia.com
doc/guides/sending-and-managing-data/prepare-your-data/in-depth/index-and-records-size-and
usage-limitations/#record-size-limits
We are generating massive objects that go into the search.json
For example the object generated by this section https://ibis-project.org/reference/expression-tables#methods is ~225KB (this one is entry 459) you can take a look at it here: https://ibis-project.org/search.json
Not even the paid planned would not allow this. For Build plans:
- 10KB for any individual record For Standard, Premium and Grow plans:
- 100 KB for any individual record
- 10 KB average record size across all records
I think a solution could be instead of using the search.json
from quarto, try to see what kind of index does the algolia crawler generates. According to this docs it sounds like we could get it for free if we have a Netlify account, which I believe we have?
Sweet, there is already a crawler GHA https://github.com/algolia/algoliasearch-crawler-github-actions.
But I'm not sure if that's the right way to go about this. Opening this to discussion.
Closes: #7995
cc: @gforsyth
I decided to start fiddling with the search.json
and it's very tricky, because it's not only the records that have "Examples" on them, the ones that will generate big entries, we have other cases too.
We have ~68 cases where the objects are big, and the examples strip only takes care of ~15. I think if we come up with very specific rules to take care of this, it won't be sustainable.
We (with @cpcloud and @gforsyth ) tried for a bit connecting the algolia crawler via the netlify interface and we couldn't get the crawler kicking.
The last thing we could try is handling everything via GHA, and see what happens. Try to follow this, https://github.com/algolia/algoliasearch-crawler-github-actions/blob/main/examples/netlify.yml and see what happens
Something got messed up on the rebase, I'll close and open a new PR.