Calvin Pelletier comments

Results 10 comments of


                                            Calvin Pelletier

Generate with KV-cache enabled vs. not enabled gives different results

@joecummings ~~I'm guessing this is because the causal mask is created in `setup_caches()` [here](https://github.com/pytorch/torchtune/blob/main/torchtune/modules/transformer.py#L171), so without calling this function we're attending to all tokens, resulting in garbage outputs. Maybe we...

No param to control save checkpoints every N steps ?

Hey @apachemycat, an option to save only the trainable weights for intermediate checkpoints is a great idea! We will add support for this soon. Regarding checkpointing every N steps, this...

How to use float8 for training?

Hi @vgoklani , we don't currently support this, but you could modify a recipe to call [torchao.float8.convert_to_float8_training](https://github.com/pytorch/ao/tree/main/torchao/float8) on your model at the end of [this function](https://github.com/pytorch/torchtune/blob/aa8f365f91a69aa36aaea14cf6f03ccd45310bb6/recipes/full_finetune_single_device.py#L410). However, I recommend using...

How to use float8 for training?

We would definitely appreciate a PR if full-finetuning in FP8 works out well for you all!

[RFC] Step-based checkpointing in torchtune

Yay step-based checkpointing! Some thoughts: 1. I second Felipe's comment about dropping support for epoch-based checkpointing. Our code will be cleaner and simpler if our whole ecosystem of checkpointing/validating/logging/etc is...

fix convert_weights not working for Qwen2.5 HF checkpoints

Hi @zhangtemplar , you're changing the generic `convert_weights` function. Qwen2.5 already has a specific convert weights function [here](https://github.com/pytorch/torchtune/blob/main/torchtune/models/qwen2/_convert_weights.py) which handles the biases of the linear projections. In our Qwen2.5 configs,...

Calvin Pelletier

Generate with KV-cache enabled vs. not enabled gives different results

No param to control save checkpoints every N steps ?

How to use float8 for training?

How to use float8 for training?

[RFC] Step-based checkpointing in torchtune

fix convert_weights not working for Qwen2.5 HF checkpoints

Error while full finetuning Llama 4 Scout

Error while full finetuning Llama 4 Scout

Error while full finetuning Llama 4 Scout

Support masking of partial dialogue in multi-turn chat datasets