LS
LS
@yosinski could you help with it when you have time? Many thanks.
could someone explain what's the idea behind for handling different dtype of base and Lora weights?
> 引入flashattention2 flash attn2 integrated into https://pytorch.org/blog/pytorch2-2/ already.
> This is definitely on our roadmap and will be tackled in the coming weeks. Here are the priorities right now: > > 1. re-write the scheduling code and cache...
+1, same error happened for `Qwen1.5-14B-Chat-AWQ (1*4090)` and `Qwen1.5-70B-Chat-AWQ (4*4090)`, even decreased the `--max-batch-prefill-tokens` to 512.
@JorTurFer Hello, any plan for this fix? thanks.
Is there any progress here? really need this feature :) thanks.
> In this case: > > ```go > r.GET("/test/:name/*trail", func (c *gin.Context) {} > ``` > > The `trail` parameter seems to always contain a slash as the first character,...
I come up with similar requirement, just customizing the error message is enough for me.
> by content and return code +1