dongqi shen comments

Results 9 comments of


                                            dongqi shen

How to accept std:cin for c++

好的，谢谢，感觉这个插件很棒，美中不足就是不能输入，Thanks

How is the number of BERT model parameters calculated?

> So does the attention head number get included? Yes, It does. Actually, for each head, the attention layer project input (which is [768]) to a small size (which is...

[RFC] [SIMT] Add CUDA warp-level intrinsics to Taichi

Got I naive question. If I want to implement a task in the issue or other opened issue, how do I know that maybe somebody do the same work as...

Support strided pytorch tensor as taichi kernel argument

When the kernel received pytorch tensor as argument, function `get_torch_callbacks(v, ...)` will check it with `v.is_contiguous()`. However the function `.from_torch()` just simply call `.contiguous()` as you described in #4258. I...

Possible M1 incompatibility with macOS Monterey (12.0 Beta)

fantastic! That works for me!

[Feature] GPU下int4的推理速度，似乎还不及llama.cpp的CPU推理速度

I think one possible reason is the tokenizer. The vocab size of llama is 32000, but size of chatglm-6b is about 150000.

[Feature] GPU下int4的推理速度，似乎还不及llama.cpp的CPU推理速度

@jinfagang The member's explanation is [here](https://github.com/THUDM/ChatGLM-6B/issues/127#issuecomment-1473366712). Not know much about tokenizer, sry about that. However, from my tests, I think ChatGLM-6b behaves much better than LLaMa-7b in Chinese.

Does it support the reasoning acceleration of Qwen-14B?

I have tested it with Qwen-1.8B on RTX 2080, and the reasoning acceleration is about twice the time compared to the original (50 tok/s vs ~100 tok/s) which is fascinating....

Does it support the reasoning acceleration of Qwen-14B?

@dashi6174 https://github.com/DongqiShen/qwen-fast