AnhLD2610

Results 2 issues of AnhLD2610

My model have 8192 context length , how to limit the input of the agent < 8192 ?

i found that in dskd_cma_tinyllama.sh you using forward_kl instead of adaptive_kl as the best results in paper should i change the KD_OBJ from "forward_kl" to "adaptive_kl".