Joe Cummings
Joe Cummings
> @msaroufim Is there a method on NF4Tensor that could be easily implemented to enable FSDP2 + QLoRA + Compile? To make this even more explicit, we have 405B working...
Using torchao-nightly I am now able to run the above command with compile=True. However, I get the following output: ``` [rank0]:W0729 13:35:08.486000 917826 torch/_dynamo/convert_frame.py:795] [8/8] torch._dynamo hit config.cache_size_limit (8) [rank0]:W0729...
> Not a torchtune author/contributor, but from the memory usage, I'm guessing that the old version performs NF4 quantization on GPU, while the new version performs it on CPU. Makes...
Hi @l-berg - thanks for bringing this to our attention! The AO folks dug deep into this and saw that a version guarded inplace_copy function was the offending issue. Please...
Closing this issue as it is now possible through the TorchAO library.
> Adding a comment to track this discussion in Pytorch core [pytorch/pytorch#130330](https://github.com/pytorch/pytorch/issues/130330) If this lands, we should enable this by default in torchtune until it lands in a PyTorch stable...
> Just wanted to confirm that running on A100, with the flag i can run bs=4, but without, it OOMs This would imply that we should be paying attention to...
> Was this the kind of thing you had in mind? [`1129f9e`/torchtune/modules/rlhf/_generation.py](https://github.com/pytorch/torchtune/blob/1129f9e3a246628c991c246d81dbead62d3437a3/torchtune/modules/rlhf/_generation.py) Yep, this is pretty much it! I take it that you're not utilizing the KV Cache for this...
Left padded: ``` My, name, is, Joe , Hello, world , , , Bye ``` Left padded mask: ``` 1 0 0 0 1 1 0 0 1 1 1...
We started with those first two recipes in order to prove out the concept, but there's no reason why we cannot add it to the single device full finetuning. We'd...