Quentin Lhoest comments

Results 416 comments of


                                            Quentin Lhoest

Shard parquet in `download_and_prepare`

This is ready for review cc @mariosasko :) please let me know what you think !

[WIP] Docs for creating a loading script for image datasets

I love it thanks ! I think moving forward we can use CSV instead of JSON Lines in the docs ;)

Support skipping Parquet to Arrow conversion when using Beam

When #4724 is merged, we can just pass `file_format="parquet"` to `download_and_prepare` and it will output parquet fiels without converting to arrow

parallel searching in multi-gpu setting using faiss

Hi ! Yes search_batch uses FAISS search which happens in parallel across the GPUs > And I don't see any speed up when increasing the number of GPUs while calling...

parallel searching in multi-gpu setting using faiss

The code looks all good to me, do you see all the GPUs being utilized ? What version of faiss are you using ?

parallel searching in multi-gpu setting using faiss

It looks all good to me then ^^ though you said you didn't experienced speed improvements by adding more GPUs ? What size is your source dataset and what time...

parallel searching in multi-gpu setting using faiss

Hmmm the number of GPUs should divide the time, something is going wrong. Can you check that adding more GPU does divide the memory used per GPU ? Maybe it...

parallel searching in multi-gpu setting using faiss

> I used to think that every GPU loads all the source vectors and the data parallelism is at the query level. 😆 Oh indeed that's possible, I wasn't sure....

parallel searching in multi-gpu setting using faiss

Maybe @albertvillanova you can take a look ? I won't be available in the following days

parallel searching in multi-gpu setting using faiss

I can confirm `add_faiss_index` calls `index = faiss.index_cpu_to_gpus_list(index, gpus=list(device))`. Could this be an issue with your environment ? Could you try running with 1 and 8 GPUs with a code...