ailia-models added Japanese LLama elyza

added Japanese LLama elyza

Open YToleubay opened this issue 1 year ago • 7 comments

#1294

Nov 14 '23 05:11 YToleubay

モデルをアップロードしました。 https://storage.googleapis.com/ailia-models/elyza-japanese-llama-2-7b/decoder_model.onnx

Nov 18 '23 08:11 kyakuno

macOSだと実時間で処理が終わらない。

Nov 18 '23 11:11 kyakuno

@YToleubay How many time do you need for inference? About ONNX Runtime and ailia?

Nov 18 '23 11:11 kyakuno

@YToleubay How many time do you need for inference? About ONNX Runtime and ailia? I did following benchmark with NVIDIA GeForce RTX 3090, 32GB ram, With onnx I have the following output:

processing time 36854 ms
processing time 32836 ms
processing time 31787 ms
processing time 31776 ms
processing time 31774 ms
**Average onnx time is =  33005.4 ms**

with ailia I have the following numbers:

 ailia processing time 1060661 ms
 ailia processing time 1061135 ms 
Average ailia time 1060898 ms

It seems inference runtime is 32 times slower on ailia than onnx

Nov 18 '23 11:11 YToleubay

Thanks. I will investigate it.

Nov 19 '23 04:11 kyakuno

Thanks. I will investigate it.

Can I help you somehow?

Nov 19 '23 05:11 YToleubay

Thank you. We will verify it with the ailia SDK team as it will be the core implementation of the ailia SDK.

Nov 19 '23 12:11 kyakuno

ailia-models ailia-models copied to clipboard

added Japanese LLama elyza

ailia-models
ailia-models copied to clipboard