Thien Tran
Thien Tran
Quant-LLM code: https://github.com/pytorch/ao/tree/main/torchao/csrc/cuda/fp6_llm Currently Quant-LLM kernel (backing FPx in torchao) only works with FP16. This creates a small divergence from other quantization methods, which all work with BF16. Since all...
In `optim.load_state_dict(state_dict)`, if optim dtype != state_dict dtype, `aten._to_copy.default` is called. This PR simply implements this op and add appropriate tests. **Update**: In PyTorch pre-2.4, calling `.to(device, dtype)` will not...
https://github.com/pytorch/ao/tree/main/torchao/prototype/quantized_training Currently INT8 training recipes only support **row-wise scaling** for weight. This should be strictly better than (or at least the same as) **tensor-wise scaling** for weight in terms of...
#### Context What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [ ] update tests and/or...
**Steps/Code to reproduce bug** ```python import torch import cutlass.epilogue def epilogue(accum, bias): D = accum + bias return D examples_tensors = dict( accum=torch.randn(1024, 1024), bias=torch.randn(1024, 1).bfloat16(), D=torch.randn(1024, 1024).bfloat16(), ) cutlass.epilogue.trace(epilogue,...
Fixes #1824 I was thinking of adding a test case for this, but currently the dtype is hard-coded to FP16 https://github.com/NVIDIA/cutlass/blob/44dae8b90ef232ea663727470dfbbe9daff6972d/test/python/cutlass/evt/utils/evt_testbed.py#L206 Would take some refactoring to test multiple dtypes at...
### Feature request Add BetterTransformer support for SEW. SEW has almost identical architecture with Wav2Vec2. In particular, the attention modules are the same. ### Motivation NA ### Your contribution I'm...