rust-bert
rust-bert copied to clipboard
Higher GPU memory usage vs Python (hkunlp/instructor-large)
Hello, I wrote a project which uses rust-bert. However, I noticed that loading the same model in python uses 1/2 of what my rust implementation uses even when I only load once. Any idea how to fix this? Any help would be appreciated. Thanks!
Extract from nvidia-smi (Python)
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 726 G /usr/lib/Xorg 518MiB |
| 0 N/A N/A 1796 G ...,WinRetrieveSuggestionsOnlyOnDemand 75MiB |
| 0 N/A N/A 2333 G ...re/Steam/ubuntu12_64/steamwebhelper 7MiB |
| 0 N/A N/A 3586 G /usr/lib/firefox/firefox 194MiB |
| 0 N/A N/A 6787 G ...sion,SpareRendererForSitePerProcess 352MiB |
| 0 N/A N/A 15763 G ...--disable-features=BackForwardCache 93MiB |
| 0 N/A N/A 79356 G alacritty 9MiB |
| 0 N/A N/A 86867 C ...envs/langchain-templates/bin/python 1448MiB |
+---------------------------------------------------------------------------------------+
Extract from nvidia-smi (Rust)
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 726 G /usr/lib/Xorg 518MiB |
| 0 N/A N/A 1796 G ...,WinRetrieveSuggestionsOnlyOnDemand 75MiB |
| 0 N/A N/A 2333 G ...re/Steam/ubuntu12_64/steamwebhelper 7MiB |
| 0 N/A N/A 3586 G /usr/lib/firefox/firefox 194MiB |
| 0 N/A N/A 6787 G ...sion,SpareRendererForSitePerProcess 328MiB |
| 0 N/A N/A 15763 G ...--disable-features=BackForwardCache 93MiB |
| 0 N/A N/A 79356 G alacritty 9MiB |
| 0 N/A N/A 86607 C target/release/axum-t5-embeddings 2830MiB |
+---------------------------------------------------------------------------------------+
Hello,
It is difficult without seeing the Python code side by side for comparison, but could it be that the Python model is loaded in half precision (fp16)?
Hello,
It is difficult without seeing the Python code side by side for comparison, but could it be that the Python model is loaded in half precision (fp16)?
Hello,
Apologies for not being clear, this is the reference impl I use. Its calling this from the Instructor library provided by the authors of instructor