RAGatouille icon indicating copy to clipboard operation
RAGatouille copied to clipboard

Problem with path in add_to_index

Open gsajko opened this issue 1 year ago • 5 comments

Using RAGatouille==0.0.6a2 in colab, trying .add_to_index

for i, batch in enumerate(batches, start=0):
    RAG.add_to_index(
        new_collection=batch,
        index_name="dharma_colb",
        split_documents=True,
    )

error message

WARNING: add_to_index support is currently experimental! add_to_index support will be more thorough in future versions

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

[/usr/local/lib/python3.10/dist-packages/colbert/infra/config/base_config.py](https://localhost:8080/#) in load_from_index(cls, index_path)
     93             metadata_path = os.path.join(index_path, "metadata.json")
---> 94             loaded_config, _ = cls.from_path(metadata_path)
     95         except:

6 frames

FileNotFoundError: [Errno 2] No such file or directory: '.ragatouille/colbert/indexes/colbert/indexes/dharma_colb/metadata.json'


During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)

[/usr/local/lib/python3.10/dist-packages/colbert/infra/config/base_config.py](https://localhost:8080/#) in from_path(cls, name)
     42     @classmethod
     43     def from_path(cls, name):
---> 44         with open(name) as f:
     45             args = ujson.load(f)
     46 

FileNotFoundError: [Errno 2] No such file or directory: '.ragatouille/colbert/indexes/colbert/indexes/dharma_colb/plan.json'

colbert/indexes is doubled in path

gsajko avatar Jan 28 '24 17:01 gsajko

Hey! Good shout, thank you for flagging. The CRUD aspects are very much experimental at the moment but we're working on bringing them up to speed, appreciate the feedback and will ping you when this is fixed!

(cc @anirudhdharmarajan further CRUD issues (vaguely related but distinct to https://github.com/bclavie/RAGatouille/issues/71))

bclavie avatar Jan 28 '24 18:01 bclavie

this isn't pressing issue for me (was using batching and add_to_index while colab was acting wierd) just wanted raport a bug, and message clearly states that this ies experimental. Cheers

gsajko avatar Jan 28 '24 18:01 gsajko

Thanks! Much appreciated 😊

bclavie avatar Jan 28 '24 18:01 bclavie

@gsajko I had the same issue, it works if you give full path of the index (returned once index is created). In your case it would be .ragatouille/colbert/indexes/dharma_colb for index_name arg

0-hero avatar Jan 30 '24 21:01 0-hero

FYI, this week got extremely busy but I'll have a fix ready by Monday!

adharm avatar Feb 02 '24 00:02 adharm

HI, I also faced the same issue. I can get around by setting RAG.model.loaded_from_index = True . RAG being a RAGPretrainedModel

tristancabel avatar Feb 07 '24 18:02 tristancabel

HI, I also faced the same issue. I can get around by setting RAG.model.loaded_from_index = True . RAG being a RAGPretrainedModel

This worked for me. It's a simple error in the setting of the path in case the model has not that property loaded_from_index. Or you can also re-instantiate the RAG model from the index everytime to be sure that the property is True

paulthemagno avatar Feb 23 '24 16:02 paulthemagno

Thanks @anirudhdharmarajan for taking a stab at this!

bclavie avatar Mar 18 '24 20:03 bclavie