Tyler Michael Smith

Results 107 comments of Tyler Michael Smith

Hi @Tim-blo, thank you for reporting your issue. I can take a look. To help me reproduce the problem, could you share your onnx file?

@Tim-blo thank you for the model! I am taking a look now.

Hi @Tim-blo, I have a fix for the issue you are running into, so this will be resolved in the next nightly that goes out, and in 1.1.0.

Hi @vvolhejn, thank you for your bug report. Could you share the results of `lscpu` to help us debug this issue?

>@fgvanzee let's talk about this; IIRC carouseling doesn't take full advantage of parallelism in both dimensions. The thing about carouselling is that it doesn't actually increase the number of FMA...

>@fgvanzee this gives me an idea: what if we modified the test driver to also count cycles (e.g. rdtsc on x86) and print FLOPS/cycle in addition to GFLOPs? `rdtsc` is...

Sorry for being the bearer of bad news. I don't know a better way.

FYI Intel MKL has this functionality already. You can call xgemm_pack(...) to pack matrices and then xgemm_compute(...) to compute with them. It might be nice to export the same interface...

Thanks @jeejeelee -- this PR is part of a larger project to add support for w8a8 quantization (which is on the Q2 roadmap https://github.com/vllm-project/vllm/issues/3861). We ran into several issues with...

@pcmoritz @comaniac There are a couple of issues to iron out still (CMakeLists changes and kernel dispatching for sure) but this should be ready to look at. @youkaichao do you...