LS

Results 20 comments of LS

@yosinski could you help with it when you have time? Many thanks.

could someone explain what's the idea behind for handling different dtype of base and Lora weights?

> 引入flashattention2 flash attn2 integrated into https://pytorch.org/blog/pytorch2-2/ already.

> This is definitely on our roadmap and will be tackled in the coming weeks. Here are the priorities right now: > > 1. re-write the scheduling code and cache...

+1, same error happened for `Qwen1.5-14B-Chat-AWQ (1*4090)` and `Qwen1.5-70B-Chat-AWQ (4*4090)`, even decreased the `--max-batch-prefill-tokens` to 512.

> In this case: > > ```go > r.GET("/test/:name/*trail", func (c *gin.Context) {} > ``` > > The `trail` parameter seems to always contain a slash as the first character,...

I come up with similar requirement, just customizing the error message is enough for me.