Zhenyu Zhu ajz34 comments

Results 14 comments of


                                            Zhenyu Zhu ajz34

*omatcopy_ct input parameter checking

Thanks for suggestions! I've tried managed writing to [email protected]. Some chats follows. Actually, I searched this problem on search engines, and found this is (probably) the only related thread discussing...

INT8量化版本在设置加载显卡时运行报错:Triton Error [CUDA]: an illegal memory access was encountered

我遇到相似的问题，但不完全相同。在使用 int8 和 int4 推理时，无法双显卡同时使用 (是指一张显卡显存溢出时，另一张显卡没有调用起来)；但原始模型 (fnlp/moss-moon-003-sft) 是正常的。是否基于 triton 的低精度模型量化的版本都不适合使用多显卡呀？

错误：Unexpected MMA layout version found

这里 Titan X 也遇到这个问题。这是否是 int8/int4 使用 triton，且 triton 目前很可能不对 Pascal 或更老型号的显卡支持 8/4 位有关？ https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/142 https://github.com/openai/triton/pull/1505#issuecomment-1517484120

Add support for BLAS Syrk

Hi devs, I found this issue when I'm working on some code to wrap BLAS (https://github.com/ajz34/blas-array2) recently. This crate is mostly for our own project (may not even useful due...