fastembed
fastembed copied to clipboard
[Feature]: Expose ONNX Runtime SessionOptions (specifically `enable_cpu_mem_arena`) to address memory leaks
What feature would you like to request?
Description: I'm experiencing a memory leak in my RAG application that uses fastembed (v0.7.3, not fastembed-gpu). Through memory profiling and troubleshooting, I've traced the issue to ONNX Runtime's memory management.
Problem: The memory leak appears to be related to known issues in ONNX Runtime:
Proposed Solution:
Expose ONNX Runtime SessionOptions in fastembed, specifically enable_cpu_mem_arena. This parameter:
- Is available in onnxruntime v1.20.0+
- Defaults to
True - Is currently not exposed in fastembed
- Documentation is here.
Testing:
I created a patch that injects enable_cpu_mem_arena = False into fastembed.common.onnx_model.OnnxModel._load_onnx_model, which successfully eliminates/constrains the memory leak in my application.
Request: I'd like to contribute a PR that exposes these (and potentially other useful) ONNX Runtime session options in fastembed. This would allow users to mitigate ONNX Runtime issues at the fastembed level while root causes are addressed upstream. I'm happy to discuss the best approach for implementing this without bloating the API.
Is there any additional information you would like to provide?
No response
#578