Albert Tseng
Albert Tseng
# 🐛 Possible Bug I am getting different results using a multitask GP setup described here https://docs.gpytorch.ai/en/stable/examples/03_Multitask_Exact_GPs/Multitask_GP_Regression.html with 1 task vs just using regular exact GP regression. I have also...
How does your updated fine tuning method work vs the one in your arxiv?
Hi, do you have any 2 bit numbers for AWQ? I'm interested in seeing how AWQ holds up at low bitrates and it would be easy to adapt AWQ to...
Hi, what command should I run in your codebase to get perplexity numbers on wikitext and c4 for your HF hub models (eg https://huggingface.co/ISTA-DASLab/Meta-Llama-3-70B-AQLM-2Bit-1x16/tree/main)
Are the models you report in your readme supposed to be actual 2 bit models or just 2.x bit models? For example, the two 7B models below are both larger...
Hi, I was reading the GaLore paper and noticed that the "ground truth" baseline seems to be pure BF16 training with nearest rounding. It is generally accepted that pure BF16...
Section 3.3.2 in the Llama 3.1 paper https://arxiv.org/pdf/2407.21783 says that Llama 3.1 was trained with FSDP on the parameters, gradients, and optimizer states. However, it also says that the parameters...
I've noticed QuaRot and other KV cache papers include perplexity, but it is unclear to me how a quantized KV cache is used during perplexity calculation. Do you have a...
In the [example T5 training code](https://github.com/pytorch/examples/blob/cdef4d43fb1a2c6c4349daa5080e4e8731c34569/distributed/FSDP/T5_training.py#L77C24-L77C35), the main function creates a copy of the model and dataset regardless of the worker rank before passing it to FSDP. Does this mean...
Where is the code that quantizes the fp32 mma output to nvfp4 for this kernel: https://github.com/NVIDIA/cutlass/blob/main/examples/72_blackwell_narrow_precision_gemm/72b_blackwell_nvfp4_nvfp4_gemm.cu ?