Albert Tseng issues

Results 10 issues of


                                            Albert Tseng

[Bug] Possible bug: Multitask GP Regression vs Exact GP Regression w/ 1 task

# 🐛 Possible Bug I am getting different results using a multitask GP setup described here https://docs.gpytorch.ai/en/stable/examples/03_Multitask_Exact_GPs/Multitask_GP_Regression.html with 1 task vs just using regular exact GP regression. I have also...

bug

Global finetuning?

How does your updated fine tuning method work vs the one in your arxiv?

2 bit AWQ results?

Hi, do you have any 2 bit numbers for AWQ? I'm interested in seeing how AWQ holds up at low bitrates and it would be easy to adapt AWQ to...

How to run perplexity eval on HF hub models?

Hi, what command should I run in your codebase to get perplexity numbers on wikitext and c4 for your HF hub models (eg https://huggingface.co/ISTA-DASLab/Meta-Llama-3-70B-AQLM-2Bit-1x16/tree/main)

Actual bitrate of models on github?

Are the models you report in your readme supposed to be actual 2 bit models or just 2.x bit models? For example, the two 7B models below are both larger...

Results vs FP32

Hi, I was reading the GaLore paper and noticed that the "ground truth" baseline seems to be pure BF16 training with nearest rounding. It is generally accepted that pure BF16...

Question RE FSDP usage in the paper

Section 3.3.2 in the Llama 3.1 paper https://arxiv.org/pdf/2407.21783 says that Llama 3.1 was trained with FSDP on the parameters, gradients, and optimizer states. However, it also says that the parameters...

How is perplexity calculated with the KV cache?

I've noticed QuaRot and other KV cache papers include perplexity, but it is unclear to me how a quantized KV cache is used during perplexity calculation. Do you have a...

Does torchrun + FSDP create multiple copies of the same dataset and model?

In the [example T5 training code](https://github.com/pytorch/examples/blob/cdef4d43fb1a2c6c4349daa5080e4e8731c34569/distributed/FSDP/T5_training.py#L77C24-L77C35), the main function creates a copy of the model and dataset regardless of the worker rank before passing it to FSDP. Does this mean...

[QST] Quantization from fp32 to nvf4?

Where is the code that quantizes the fp32 mma output to nvfp4 for this kernel: https://github.com/NVIDIA/cutlass/blob/main/examples/72_blackwell_narrow_precision_gemm/72b_blackwell_nvfp4_nvfp4_gemm.cu ?

question

? - Needs Triage