Modernize Galaxy Tool Search function
Galaxy's search function is based on Whoosh, which is unmaintained. Bioblend currently doesnt use any intelligent search feature (probably for performance reasons) Performance and usability issues in the frontend as well as backend implementation. This PR is seeking to modernise galaxy's tool search infrastructure in preparation for its use inside the MCP.
Current list of identified issues and their implementation status [x] replace whoosh with pytantivy which is maintained and unlike Lucene doesn't pull jvm into the dependencies [ ] re-jig frontend to generate pytantivy search queries [ ] advanced search frontend currently places a lot of load on the server -> performance test [ ] current index generation creates one document per tool_id move towards aggregating by tool with changes (yields a 300% decrease in search index) [ ] cache text search indices for consumption by the frontend [ ] re-write the fast frontend search to use a full text library - stream in cache search index then switch over to complete online full text search (if possible)
How to test the changes?
(Select all options that apply)
- [ ] I've included appropriate automated tests.
- [ ] This is a refactoring of components with existing test coverage.
- [ ] Instructions for manual testing are as follows:
- [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]
License
- [x] I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.
Thank you for working on this! Should be very cool!
Some future plans in the comment here: https://github.com/galaxyproject/galaxy/pull/20747#issuecomment-3214837773 that could be relevant?