Gintas Z. comments

Results 39 comments of


                                            Gintas Z.

Add Default Timeout to urllib.request.urlopen Calls to Prevent Potential Hanging

I think this is a real problem. @hnyls2002 have you tried testing generation with batch size of 100 or 1000 and multi-step structured generation with connection to a remote endpoint?...

Add Default Timeout to urllib.request.urlopen Calls to Prevent Potential Hanging

@m0g1cian I had solved with this retry logic https://github.com/sgl-project/sglang/pull/424

Development Roadmap

I'd request to include support for Phi-3-mini

KV cache pool leak detected!

I observe this with `meta-llama/Meta-Llama-3-8B-Instruct`. I think it’s very critical issue

KV cache pool leak detected!

> Also experiencing this with `meta-llama/Meta-Llama-3-8B-Instruct`, this makes the library more or less unusable for me. Which is a shame because I love sglang. I've reverted back to 0.1.14 and...

@m0g1cian I've observed that with some Regex formulations, current versions will get stuck in an infinite generation long beyond max token length, check my other issue: https://github.com/sgl-project/sglang/issues/414

KV cache pool leak detected!

> @Gintasz As of now, I just set `max_tokens` parameter as a safeguard in every `sgl.gen()` to avoid such infinite generation issue. Yes, but with some regex formulations, this 'max_tokens'...

How to launch server with quantized model?

@KMouratidis can you write a full list of commands how did you pull the weights of the mixtral model and passed them to sglang?

How to launch server with quantized model?

@KMouratidis yeah and what was your `$MODEL` value? Because I tried like this below and I got model path name error, as in original post: ``` python3 -m sglang.launch_server --model-path...

Raycast plugin

I'd totally want this as a raycast plugin.