OpenNMT-py
OpenNMT-py copied to clipboard
Improve CTranslate2 wrapping in translation_server
https://forum.opennmt.net/t/ctranslate2-on-opennmt-py-server/4175/8
After reviewing the code, here's what could be improved:
- [x] Make the following translator parameters configurable:
-
inter_threads
-
intra_threads
-
compute_type
-
- [ ] Allow parallel translations as supported by CTranslate2: I tried to enable that but even though the
waitress
module is multi-threaded and accepts concurrent requests, it seems the requests are then processed sequentially - [ ] Revise the unloading mechanism to not assume the model is running on the GPU
- [ ] Maybe cleanup the initial dummy translation: the first translation has a higher latency on GPU but this was improved in recent versions (I think it's around 200 ms now)
I tried to enable that but even though the waitress module is multi-threaded and accepts concurrent requests, it seems the requests are then processed sequentially
I did not realize that the translation method is inside a critical section. Note this is not needed for CTranslate2: the translation and model loading/unloading are fully thread safe. So removing the critical section for CTranslate2 can improve the scalability of the server for CPU translations with inter_threads
> 1 and multi-GPU translations.
@francoishernandez @pltrdy do you recall why this https://github.com/OpenNMT/OpenNMT-py/pull/1108 was introduced ? threads memoery leakages ?
I think that in the translation server loading/unloading and even running a model was not thread safe. I don't know anything about CTranslate 2 tho, so I can't tell how they differ
"Good evening. I have an issue. When I run the command (ct2-opennmt-py-converter --model_path averaged-10-epoch.pt --output_dir ende_ctranslate2 --quantization int8
), I get this error (ModuleNotFoundError: No module named 'onmt.inputters.text_dataset'
)."