Joe Cummings

Results 278 comments of Joe Cummings

> @msaroufim Is there a method on NF4Tensor that could be easily implemented to enable FSDP2 + QLoRA + Compile? To make this even more explicit, we have 405B working...

Using torchao-nightly I am now able to run the above command with compile=True. However, I get the following output: ``` [rank0]:W0729 13:35:08.486000 917826 torch/_dynamo/convert_frame.py:795] [8/8] torch._dynamo hit config.cache_size_limit (8) [rank0]:W0729...

> Not a torchtune author/contributor, but from the memory usage, I'm guessing that the old version performs NF4 quantization on GPU, while the new version performs it on CPU. Makes...

Hi @l-berg - thanks for bringing this to our attention! The AO folks dug deep into this and saw that a version guarded inplace_copy function was the offending issue. Please...

Closing this issue as it is now possible through the TorchAO library.

> Adding a comment to track this discussion in Pytorch core [pytorch/pytorch#130330](https://github.com/pytorch/pytorch/issues/130330) If this lands, we should enable this by default in torchtune until it lands in a PyTorch stable...

> Just wanted to confirm that running on A100, with the flag i can run bs=4, but without, it OOMs This would imply that we should be paying attention to...

> Was this the kind of thing you had in mind? [`1129f9e`/torchtune/modules/rlhf/_generation.py](https://github.com/pytorch/torchtune/blob/1129f9e3a246628c991c246d81dbead62d3437a3/torchtune/modules/rlhf/_generation.py) Yep, this is pretty much it! I take it that you're not utilizing the KV Cache for this...

Left padded: ``` My, name, is, Joe , Hello, world , , , Bye ``` Left padded mask: ``` 1 0 0 0 1 1 0 0 1 1 1...

We started with those first two recipes in order to prove out the concept, but there's no reason why we cannot add it to the single device full finetuning. We'd...