ao
ao copied to clipboard
gemlite integration in torchao
Summary:
- integrated gemlite kernels from https://github.com/mobiusml/gemlite
- updated kernel wrappers to work with compile (needed to change acc_dtype to torch.dtype equivalent and added basic op/fake_op registration)
- added bs>1 support to llama benchmark
on A100 results are not great for the kernel, likely needs further tuning before further integration makes sense.
Test Plan: benchmarks.sh
Reviewers:
Subscribers:
Tasks:
Tags: