ltu icon indicating copy to clipboard operation
ltu copied to clipboard

Maximux Length for LTU-AS Audio Input

Open dingdongwang opened this issue 1 year ago • 1 comments

Hi, may I ask what the maximum allowable length is for audio input? Would a 1-minute WAV file be within the acceptable range?

Thank you!

dingdongwang avatar Feb 19 '24 19:02 dingdongwang

hi there,

It really depends on your GPU, but in general, 1 minute would be fine.

Our code supports 10 seconds (hard coded) at 3.2Hz, so 32 audio tokens. We have about 100-200 text tokens, so in total ~200 tokens.

For 1 minute, you would need 192 audio tokens, counting 100-200 text tokens, you would need ~400 tokens, which doubles our cost. And you would need some engineering effort to change our hard coded part.

-Yuan

YuanGongND avatar Apr 07 '24 21:04 YuanGongND