Driss Guessous
Driss Guessous
First of all I wanna say thank you for making such an amazing tool! It has greatly improved my work flow so can't thank you enough! I am often astonished...
# Summary We are currently working on integrating a fp8 scaled matmul kernel written using Cutlass into PyTorch. PyTorch has the constraint that it can be linked against the cuda...
# Summary https://github.com/pytorch/pytorch/pull/125204 Is failing on windows with: ``` Shell 2024-06-03T10:06:46.7711483Z C:/cb/pytorch_1000000000000/work/aten/src/ATen/../../../third_party/cutlass/include\cutlass/uint128.h(189): error: calling a __host__ function("_udiv128") from a __host__ __device__ function("cutlass::uint128_t::operator / const") is not allowed 2024-06-03T10:06:46.7712785Z 2024-06-03T10:06:46.7713366Z 1...
# Summary Sm90a is needed to enable some features found in Cutlass. For some reason google as the worst SEO for finding more information about this gencode. This is the...
# Summary For more details see this PyTorch issue: https://github.com/pytorch/pytorch/issues/131257 I was able to reproduce on: 74b0761ff7efc7b90d4e5aeb529c1b2a09a7458c with following script: ``` Python import torch import torch.nn.functional as F from torch.nn.attention...
# Summary * Switch the ordering of the authors * Add in h100 perf charts
# Summary Currently we have two "eval" scripts for measuring performance of LLMs post quantization: https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py, https://github.com/pytorch/ao/blob/main/scripts/hf_eval.py The default task we have is wikitext. We should create a "large" eval...
# Summary Autoquant will iterate through a user module and identify all linear dtype + shapes as well as execution time for different quantization routines. This information is baked into...
Stacked PRs: * #709 * #707 * __->__#706 --- --- --- ### add ability to calculate amax in tiles ghstack-source-id: 83ccec3ec66f30b9d75146d0fc7b1137ea7574c4 Pull Request resolved: https://github.com/pytorch/ao/pull/682
# Summary Currently our CI/CD pipline uses ruff to format and lint files in the codebase. They are hardcoded to the list in ruff.toml. We should also add mypy support...