fastembed ONNXRuntime taking up too much memory

ONNXRuntime taking up too much memory

Open AlterHoodie opened this issue 9 months ago • 6 comments

ONNXRuntime takes up too much memory (more like accumulating cause I believe it's not freeing up unused memory), when trying to embed large collections of data. Am I missing something or is this a problem with the runtime itself. Iam trying to embed about 10000 documents (Average Size - 3000 characters) using the JinaAI Colbert Model (Late Interaction Model) GPU - Tesla T4 16 GB VRam

Feb 20 '25 11:02 AlterHoodie

What's the batch size you're using? Are you keeping the embeddings in memory or are you uploading them somewhere else / write to disk? Colbert embeddings are quite huge since colbert produce 128-dim embedding per token

Feb 20 '25 12:02 joein

Dumping them into pickle files every 1000 docs, batch size is just 1

Feb 20 '25 12:02 AlterHoodie

As for now, I was able to reproduce the issue and it indeed seems like a problem with onnxruntime not freeing up the space However, we might need more time to investigate it, thank you for pointing it out

Feb 20 '25 13:02 joein

Hey, any updates on this?

Mar 11 '25 07:03 AlterHoodie

Yeah, we're working on a fix https://github.com/qdrant/fastembed/pull/493

Mar 11 '25 08:03 joein

I'm also facing the same issue. Is there any update on this feature ?

Aug 01 '25 11:08 pankajpriya42

fastembed fastembed copied to clipboard

ONNXRuntime taking up too much memory

fastembed
fastembed copied to clipboard