GenAIExamples icon indicating copy to clipboard operation
GenAIExamples copied to clipboard

Proposal and feedback requested: Wikipedia RAG GenAIExamples

Open endomorphosis opened this issue 6 months ago • 8 comments

I am mentoring some college students with LAION, one of the students is working on embeddings for wikipedia, and its not yet ready to be pushed to OPEA yet, but I want to collect feedback about an issue we discussed.

Do you guys prefer to have the entire article text be in the vectordb, or would you prefer to have only the article abstract be in the vector db. Also I had asked him to follow the example on the huggingface datasets, with regards to using the hugginface FaissIndex and elasticsearch index, but I want to confirm that this is the method that works best for you guys.

@sleepingcat4 is the college student. and his WIP repository is located here https://github.com/sleepingcat4/wikidataset and his WIP dataset is here https://huggingface.co/datasets/laion/Wikipedia_11_23_BGE-M3_Embeddings (but I have told him he needs to rework both of these, so be aware that this information is going to change.

endomorphosis avatar Aug 15 '24 17:08 endomorphosis