Vincent Bosch
Vincent Bosch
I agree that inference should automatically be cancelled as soon as the connection is lost. However, I am more curious as to why the connection drops. First I thought it's...
I tested a bit more. Even when the model is loaded, but prompted with a large prompt, a timeout occurs. I think that the timeout should be removed when using...
@danny-avila Did you have a chance to look into this issue? Would be great if timeouts for custom endpoints can be changed and/or disabled completely. Thanks!
@MasterJH5574 Thanks for the quick response! I just updated to the latest nightly and retried. Small draft-mode does work now, however the speed running with small draft is slower than...
Just tried it with the latest nightly 274 and the issue is still present.
Update: I retried with a smaller context size. 8192 instead of the context size of the model (32768). Now the model loads correctly and I can interact with it. The...
In addition, big-AGI reports the following error: "**[Service Issue] Openai**: fetch failed - SocketError: other side closed · {"name":"SocketError","code":"UND_ERR_SOCKET","socket":"
> I already made a PR to MLX-LM to support 1B. > > https://github.com/ml-explore/mlx-examples/pull/1336 You're very quick! Great work! Would that PR also work for text only use of larger...
I just tried the converted model in "--chat"-mode, but as response to a text-only query I get only "< pad >" as output
I have just converted the model from hf to gguf and then quantized to Q8 with the following extra options: --leave-output-tensor --token-embedding-type f16. Model seems to be responding quite good,...