Thomas Viehmann comments

Results 227 comments of


                                            Thomas Viehmann

quantization: process tensors on meta device directly, maybe implement CPU quantization (if it is easy)

Hi @tombawor , thank you for your interest. I don't think we need a new class, just functions to complement `bitsandbytes.functional.quantize_4bit(w, quant_type="nf4")` for meta and cpu inputs (to return a...

High Peak Memory with CUDAGraphTransform

This is likely an issue that where we create static input buffers for things that we would not want to do that for (e.g. maybe saved for backward tensors when...

High Peak Memory with CUDAGraphTransform

So there are two things actually: - the input buffers in the backward. This will improve on that: ```python class MyCUDAGraphTransform(CUDAGraphTransform): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.outputs_from_forward = None...

High Peak Memory with CUDAGraphTransform

So I think we may have a more basic problem here than what cudagraphs do, that our tooling works with function calls that does not let the args go out...

[DTensor] Refactor to re-use prims

Note that you're converting a 2xN problem into a Nx2 problem by pushing dtensor support into prims. In the olden days, we decomposed ops to factor out common patterns, maybe...

Update version.info to 2.4.1.dev1

Can we use the PyTorch scheme of `2.5.0a0` though? This will sort before the final 2.5.0 in pip (and other Python standard) versioning.

Update version.info to 2.4.1.dev1

The version format should still use FOOa0 instead of FOO.dev . There is a pep for it and then the versions compare properly. In terms of features, I would love...