Tyler Michael Smith
Tyler Michael Smith
Hi @Tim-blo, thank you for reporting your issue. I can take a look. To help me reproduce the problem, could you share your onnx file?
@Tim-blo thank you for the model! I am taking a look now.
Hi @Tim-blo, I have a fix for the issue you are running into, so this will be resolved in the next nightly that goes out, and in 1.1.0.
Hi @vvolhejn, thank you for your bug report. Could you share the results of `lscpu` to help us debug this issue?
>@fgvanzee let's talk about this; IIRC carouseling doesn't take full advantage of parallelism in both dimensions. The thing about carouselling is that it doesn't actually increase the number of FMA...
>@fgvanzee this gives me an idea: what if we modified the test driver to also count cycles (e.g. rdtsc on x86) and print FLOPS/cycle in addition to GFLOPs? `rdtsc` is...
Sorry for being the bearer of bad news. I don't know a better way.
FYI Intel MKL has this functionality already. You can call xgemm_pack(...) to pack matrices and then xgemm_compute(...) to compute with them. It might be nice to export the same interface...
Thanks @jeejeelee -- this PR is part of a larger project to add support for w8a8 quantization (which is on the Q2 roadmap https://github.com/vllm-project/vllm/issues/3861). We ran into several issues with...
@pcmoritz @comaniac There are a couple of issues to iron out still (CMakeLists changes and kernel dispatching for sure) but this should be ready to look at. @youkaichao do you...