DSKD icon indicating copy to clipboard operation
DSKD copied to clipboard

Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models". A general white-box KD framework for both same-tokenizer and cross-tokenizer LLM distillation.

Results 3 DSKD issues
Sort by recently updated
recently updated
newest added

Can you give me some advice for training a 7b student model with 32b teacher model? Thank you.

你好 请问一下我执行 bash scripts/eval/run_eval.sh ${CKPT_PATH} ${EVAL_BATCH_SIZE}时报错awk: run time error: negative field index $-3 FILENAME="-" FNR=1 NR=1 awk: run time error: negative field index $-3 FILENAME="-" FNR=1 NR=1 awk: run time...

i found that in dskd_cma_tinyllama.sh you using forward_kl instead of adaptive_kl as the best results in paper should i change the KD_OBJ from "forward_kl" to "adaptive_kl".