dongqi shen

Results 9 comments of dongqi shen

好的,谢谢,感觉这个插件很棒,美中不足就是不能输入,Thanks

> So does the attention head number get included? Yes, It does. Actually, for each head, the attention layer project input (which is [768]) to a small size (which is...

Got I naive question. If I want to implement a task in the issue or other opened issue, how do I know that maybe somebody do the same work as...

When the kernel received pytorch tensor as argument, function `get_torch_callbacks(v, ...)` will check it with `v.is_contiguous()`. However the function `.from_torch()` just simply call `.contiguous()` as you described in #4258. I...

I think one possible reason is the tokenizer. The vocab size of llama is 32000, but size of chatglm-6b is about 150000.

@jinfagang The member's explanation is [here](https://github.com/THUDM/ChatGLM-6B/issues/127#issuecomment-1473366712). Not know much about tokenizer, sry about that. However, from my tests, I think ChatGLM-6b behaves much better than LLaMa-7b in Chinese.

I have tested it with Qwen-1.8B on RTX 2080, and the reasoning acceleration is about twice the time compared to the original (50 tok/s vs ~100 tok/s) which is fascinating....

@dashi6174 https://github.com/DongqiShen/qwen-fast