Paralism in generating embeddings

Open Daniel-SicSo-Edinburgh opened this issue 1 year ago • 1 comments

Hi,

I was looking over the docs, and batch inference was mentioned. I looked at the code, and it is not batch inference. It is sequential inference.

I was really hoping for batch inference because I have a lot of samples I need to embed, and doing them in parallel would save a lot of time. Maybe this can be added as a feature in the future.

For now, I would advise that a disclaimer should be added to the docs and warn users that it is sequential and thus slow.

Aug 17 '24 03:08 Daniel-SicSo-Edinburgh

Perhaps one of the library's API methods would help in your case? I think I've seen ways to get more low-level control over the parsing/processing before it's being fed into the text encoder. Worth a shot maybe?

Aug 17 '24 11:08 aristotaloss