gazelle
gazelle copied to clipboard
Add a client for Gazelle
Created an abstraction over some of the code in the example notebooks (infer
, infer-quantized
) so that the user can simply:
from gazelle import GazelleClient
client = GazelleClient(quantization="8-bit")
resp = client.infer(audio, prompt="What does the following audio say? \n <|audio|>")
print(resp)
Because of limited resources, I have not yet been able to extensively test it. It seems to work regularly with quantization, but I occasionally get some errors w/ the conv layer without quantization.
Would be happy to take feedback!