cookbook
cookbook copied to clipboard
Token counting for Audio input
Description of the feature request:
It would be great to know a close estimate of how many tokens it costs per minute of audio. In this guide, it mentions "Audio and video are each converted to tokens at a fixed rate of tokens per minute." Using the audio example you used in the guide as a reference point, for ~44 minutes of audio, it would cost ~1,899 tokens/minute of audio (83552/44). Is my understanding correct? Also, would the number of tokens change based on the audio input type (eg. wav vs mp3)?
What problem are you trying to solve with this feature?
Estimating the count of tokens for audio input.
Any other information you'd like to share?
No response
@aalhayali Token count depends upon the length of the audio rather than the size or type of the audio input I investigated how audio format and file size affect the number of tokens generated from audio. I used audio clips of the same length (3.07 minutes) in various formats (mp3, wav, flac, aac and m4a). Interestingly, the token count did not depend on the format or file size of the audio. Instead, it solely relied on the audio's duration. In other words, clips with the same length resulted in the same number of tokens, regardless of format or file size. Please find the gist
Hi @aalhayali, the current version of Audio.ipynb includes an example of how to use model.count_token() against an audio file stored at the File API.
Could you check it? I think it is exactly what you are looking for.
cheers, Luciano Martins.
I think this is answered now. Feel free to open another issue if you think there's more we can do.