Thomas Viehmann
Thomas Viehmann
Hi @tombawor , thank you for your interest. I don't think we need a new class, just functions to complement `bitsandbytes.functional.quantize_4bit(w, quant_type="nf4")` for meta and cpu inputs (to return a...
This is likely an issue that where we create static input buffers for things that we would not want to do that for (e.g. maybe saved for backward tensors when...
So there are two things actually: - the input buffers in the backward. This will improve on that: ```python class MyCUDAGraphTransform(CUDAGraphTransform): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.outputs_from_forward = None...
So I think we may have a more basic problem here than what cudagraphs do, that our tooling works with function calls that does not let the args go out...
Note that you're converting a 2xN problem into a Nx2 problem by pushing dtensor support into prims. In the olden days, we decomposed ops to factor out common patterns, maybe...
Can we use the PyTorch scheme of `2.5.0a0` though? This will sort before the final 2.5.0 in pip (and other Python standard) versioning.
The version format should still use FOOa0 instead of FOO.dev . There is a pep for it and then the versions compare properly. In terms of features, I would love...