Fanis Tharropoulos
Fanis Tharropoulos
Thank you for reporting this issue with detailed reproduction steps. I've identified and fixed the root cause of the inconsistent page scraping behavior. The problem was in how we handled...
I tested it in your repository. Could you try replicating it in a containerized environment?
I'm running the Docscraper in docker, Typesense in docker as well. Docusaurus in running through Node 23 on Arch Linux, kernel 6.12.4
To ensure that's the case, maybe try cloning this, and running it locally. I'm using the pipenv shell with Python 3.10.16 here and it works. It does indeed not work...
If you try the same, but with `symbols_to_index` in the schema's root level, does it then work? Also, what version of Typesense are you using?
Sorry, I wasn't proposing that as a solution, but to see if it has to do with field-level tokenization or if it's the case regardless.
Can you post a set of instructions for a reproducible example like here: https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b This will help us better debug and find the issue
After debugging, I see that `"CTTI-2024-15"` does not exist in the dataset, so that's why it's not up to the top. `CTTI-2024-259` exists, but it is only at the top...
We recently addressed some issues with field-level `symbols_to_index` in `v29.0rc27`. Can you update to that version and see if the issue persists?
RC builds are safe for production, yes. ETA for v29 isn't known yet, but we're approaching the code freeze soon