Juarez Bochi
Juarez Bochi
PS: You can inspect the tokenizer to see what the available language code ares ```python for i in range(500): print(i, tokenizer.decode(i)) ``` > 4 > 5 > 6 > 7...
@waan1 The gguf files are not part of the official release by Google. They only work with [candle](https://github.com/huggingface/candle). You can find the instructions [here](https://huggingface.co/google/madlad400-3b-mt#running-the-model-with-candle). There's also this [thread](https://huggingface.co/jbochi/madlad400-3b-mt/discussions/4#654d0fc6e8c5f79f5d0d9f6e).
Unfortunately there is no way to specify a load balancing policy (issue #21). I think this is the main drawback of this driver. Regarding async queries, the driver uses the...
Running queries inside coroutines would not help because the driver reads the query results synchronously. Nginx is not blocked because of cosocket api's magic. The easiest solution is to fix...