AnhLD2610
Results
2
issues of
AnhLD2610
My model have 8192 context length , how to limit the input of the agent < 8192 ?
KD approach
22
i found that in dskd_cma_tinyllama.sh you using forward_kl instead of adaptive_kl as the best results in paper should i change the KD_OBJ from "forward_kl" to "adaptive_kl".