RAGatouille
RAGatouille copied to clipboard
heatmap visualization of query + documents with example
Few questions here:
- Why did you hardcode the max_len at 512 tokens?
- It seems like there are like 3 locations for the configuration of max_length - this causes a lot of mess in quite a few places
- How do you think we should visualize the interaction on longer documents?
Hey, this is great, thank you! Will properly review in the next few days, but quickly:
Why did you hardcode the max_len at 512 tokens?
Ease-of-development to get the first version out quicker. The max_len's max value is hardcoded as such because all ColBERT models out there are spun-off models that have a max 512 token window, but this can and hopefully will change, so it shouldn't be hardcoded as such.
It seems like there are like 3 locations for the configuration of max_length - this causes a lot of mess in quite a few places
Good point. Some clean-up is in order, with a single attribute in the ColBERT
class and all the upstream places where this config can be defined pointing to it.
How do you think we should visualize the interaction on longer documents?
Definitely an open-ended question I'd say. No strong thoughts at the moment, but I'll think about it 🤔
Cool, right now I added a parameter for the document length and I'm setting it upstream manually (not ideal but working) Tried using a png and it looks pretty good with regards to visualizing even longer documents (tried that with some internal data and it's super cool!) If you want to think together on how to order the abstractions always happy to help
Hey, thanks again!
A couple comments:
- Could you move the
main.py
file you've created to the examples? Ideally as a notebook, so people can use it for visualisation! - Matplotlib and Seaborn aren't currently dependencies of the lib. Would you be able to add them as optional dependencies? e.g. they should only be installed if someone runs
pip install ragatouille[visualisation]
orpip install ragatouille[all]
Apart from that, LGTM as a community extension!