Joe Cummings comments

Results 278 comments of


                                            Joe Cummings

[RFC] Supporting KV-cache toggling

If the bottleneck is really in the teardown (setup is unavoidable I believe), then we could opt for a similar approach to HF: ```python batch = batch.to(device) with torch.no_grad(): ......

Add evaluation configs under phi3 dir

@Harthi7 Would you mind merging main branch and running the linter? Then, we can go ahead and get this merged :)

Error when running a llama-2-70b-qkv-mlperf model

Hey @kailashg26 - good question! The error here comes from the fact that the model utilizes a fused QKV while torchtune does not natively support this. You should take a...

Error when running a llama-2-70b-qkv-mlperf model

> Hi [@joecummings](https://github.com/joecummings), Do you think is this the correct way to do it? > > First, I create a convert weights file in torchtune/models/llama2/llama2_qkv.py with > > ``` >...

Error when running a llama-2-70b-qkv-mlperf model

> Also, [@joecummings](https://github.com/joecummings) could you give some insights into how I should input the dataset, which has input IDs and labels? Can you provide an example? By input IDs, do...

Error when running a llama-2-70b-qkv-mlperf model

> Hi [@joecummings](https://github.com/joecummings) , so I get some error when running llama-270B-qkv > > **llama2_qkv.py code in torchtune/models/llama2/** > > ``` > from typing import Dict, Optional > import torch...

Error when running a llama-2-70b-qkv-mlperf model

> Hi [@joecummings](https://github.com/joecummings) , this is the dataset [huggingface.co/datasets/regisss/scrolls_gov_report_preprocessed_mlperf_2](https://huggingface.co/datasets/regisss/scrolls_gov_report_preprocessed_mlperf_2) I'm trying to use. Looks like this format is not supported right? Could you let me know how to use this...

fix convert_weights not working for Qwen2.5 HF checkpoints

> Hi @zhangtemplar , you're changing the generic `convert_weights` function. Qwen2.5 already has a specific convert weights function [here](https://github.com/pytorch/torchtune/blob/main/torchtune/models/qwen2/_convert_weights.py?rgh-link-date=2025-01-08T03%3A15%3A36Z) which handles the biases of the linear projections. > > In...

fix convert_weights not working for Qwen2.5 HF checkpoints

No actual bug here - closing.

Improvement: add a "division by zero" check in chunked loss handling in kd_losses.py

Great catch! We'd definitely welcome a small PR for this if you want to do it, otherwise we can try to get to it soon.