ailia-models
ailia-models copied to clipboard
added Japanese LLama elyza
#1294
モデルをアップロードしました。 https://storage.googleapis.com/ailia-models/elyza-japanese-llama-2-7b/decoder_model.onnx
macOSだと実時間で処理が終わらない。
@YToleubay How many time do you need for inference? About ONNX Runtime and ailia?
@YToleubay How many time do you need for inference? About ONNX Runtime and ailia? I did following benchmark with NVIDIA GeForce RTX 3090, 32GB ram, With onnx I have the following output:
processing time 36854 ms
processing time 32836 ms
processing time 31787 ms
processing time 31776 ms
processing time 31774 ms
**Average onnx time is = 33005.4 ms**
with ailia I have the following numbers:
ailia processing time 1060661 ms
ailia processing time 1061135 ms
Average ailia time 1060898 ms
It seems inference runtime is 32 times slower on ailia than onnx
Thanks. I will investigate it.
Thanks. I will investigate it.
Can I help you somehow?
Thank you. We will verify it with the ailia SDK team as it will be the core implementation of the ailia SDK.