llm.c
llm.c copied to clipboard
Input token length question
I am a littile confused. In transformer based LLM inference(or training). We always set the input length as the max_input length and padding 0 , or we dynamically get the actual input length and decrease the computation?
This issue is about that: https://github.com/karpathy/llm.c/issues/146 Right now we always forward B * T tokens in a single, fixed, batch configuration that never changes. In principle you can dynamically lower B,T dimensions to save computation, but it is tricky and requires thought and tests.
Thx!