Quentin Lhoest

Results 416 comments of Quentin Lhoest

This is ready for review cc @mariosasko :) please let me know what you think !

I love it thanks ! I think moving forward we can use CSV instead of JSON Lines in the docs ;)

When #4724 is merged, we can just pass `file_format="parquet"` to `download_and_prepare` and it will output parquet fiels without converting to arrow

Hi ! Yes search_batch uses FAISS search which happens in parallel across the GPUs > And I don't see any speed up when increasing the number of GPUs while calling...

The code looks all good to me, do you see all the GPUs being utilized ? What version of faiss are you using ?

It looks all good to me then ^^ though you said you didn't experienced speed improvements by adding more GPUs ? What size is your source dataset and what time...

Hmmm the number of GPUs should divide the time, something is going wrong. Can you check that adding more GPU does divide the memory used per GPU ? Maybe it...

> I used to think that every GPU loads all the source vectors and the data parallelism is at the query level. 😆 Oh indeed that's possible, I wasn't sure....

Maybe @albertvillanova you can take a look ? I won't be available in the following days

I can confirm `add_faiss_index` calls `index = faiss.index_cpu_to_gpus_list(index, gpus=list(device))`. Could this be an issue with your environment ? Could you try running with 1 and 8 GPUs with a code...