RAGatouille icon indicating copy to clipboard operation
RAGatouille copied to clipboard

heatmap visualization of query + documents with example

Open flexorRegev opened this issue 1 year ago • 3 comments

flexorRegev avatar Jan 28 '24 10:01 flexorRegev

Few questions here:

  1. Why did you hardcode the max_len at 512 tokens?
  2. It seems like there are like 3 locations for the configuration of max_length - this causes a lot of mess in quite a few places
  3. How do you think we should visualize the interaction on longer documents?

flexorRegev avatar Jan 28 '24 11:01 flexorRegev

Hey, this is great, thank you! Will properly review in the next few days, but quickly:

Why did you hardcode the max_len at 512 tokens?

Ease-of-development to get the first version out quicker. The max_len's max value is hardcoded as such because all ColBERT models out there are spun-off models that have a max 512 token window, but this can and hopefully will change, so it shouldn't be hardcoded as such.

It seems like there are like 3 locations for the configuration of max_length - this causes a lot of mess in quite a few places

Good point. Some clean-up is in order, with a single attribute in the ColBERT class and all the upstream places where this config can be defined pointing to it.

How do you think we should visualize the interaction on longer documents?

Definitely an open-ended question I'd say. No strong thoughts at the moment, but I'll think about it 🤔

bclavie avatar Jan 28 '24 18:01 bclavie

Cool, right now I added a parameter for the document length and I'm setting it upstream manually (not ideal but working) Tried using a png and it looks pretty good with regards to visualizing even longer documents (tried that with some internal data and it's super cool!) If you want to think together on how to order the abstractions always happy to help

flexorRegev avatar Jan 29 '24 20:01 flexorRegev

Hey, thanks again!

A couple comments:

  • Could you move the main.py file you've created to the examples? Ideally as a notebook, so people can use it for visualisation!
  • Matplotlib and Seaborn aren't currently dependencies of the lib. Would you be able to add them as optional dependencies? e.g. they should only be installed if someone runs pip install ragatouille[visualisation] or pip install ragatouille[all]

Apart from that, LGTM as a community extension!

bclavie avatar Feb 07 '24 19:02 bclavie