ibis icon indicating copy to clipboard operation
ibis copied to clipboard

docs: improve searchibility with algolia

Open ncclementi opened this issue 1 year ago • 1 comments

This is a work in progress, not working yet we have an issue with the records size.

I was trying to run the .github/workflows/upload-algolia.py locally and run into this kind of error

algoliasearch.exceptions.RequestException: Record at the position 459 objectID=reference/expression
tables.html#methods is too big size=224921/10000 bytes. Please have a look at https://www.algolia.com
doc/guides/sending-and-managing-data/prepare-your-data/in-depth/index-and-records-size-and
usage-limitations/#record-size-limits

We are generating massive objects that go into the search.json For example the object generated by this section https://ibis-project.org/reference/expression-tables#methods is ~225KB (this one is entry 459) you can take a look at it here: https://ibis-project.org/search.json

Not even the paid planned would not allow this. For Build plans:

  • 10KB for any individual record For Standard, Premium and Grow plans:
  • 100 KB for any individual record
  • 10 KB average record size across all records

I think a solution could be instead of using the search.json from quarto, try to see what kind of index does the algolia crawler generates. According to this docs it sounds like we could get it for free if we have a Netlify account, which I believe we have?

Sweet, there is already a crawler GHA https://github.com/algolia/algoliasearch-crawler-github-actions.

But I'm not sure if that's the right way to go about this. Opening this to discussion.

Closes: #7995

cc: @gforsyth

ncclementi avatar Feb 12 '24 23:02 ncclementi

I decided to start fiddling with the search.json and it's very tricky, because it's not only the records that have "Examples" on them, the ones that will generate big entries, we have other cases too.

We have ~68 cases where the objects are big, and the examples strip only takes care of ~15. I think if we come up with very specific rules to take care of this, it won't be sustainable.

We (with @cpcloud and @gforsyth ) tried for a bit connecting the algolia crawler via the netlify interface and we couldn't get the crawler kicking.

The last thing we could try is handling everything via GHA, and see what happens. Try to follow this, https://github.com/algolia/algoliasearch-crawler-github-actions/blob/main/examples/netlify.yml and see what happens

ncclementi avatar Feb 13 '24 18:02 ncclementi

Something got messed up on the rebase, I'll close and open a new PR.

ncclementi avatar Feb 21 '24 16:02 ncclementi