Input token length question

Open kaizizzzzzz opened this issue 1 year ago • 2 comments

I am a littile confused. In transformer based LLM inference(or training). We always set the input length as the max_input length and padding 0 , or we dynamically get the actual input length and decrease the computation?

Apr 21 '24 15:04 kaizizzzzzz

This issue is about that: https://github.com/karpathy/llm.c/issues/146 Right now we always forward B * T tokens in a single, fixed, batch configuration that never changes. In principle you can dynamically lower B,T dimensions to save computation, but it is tricky and requires thought and tests.

Apr 21 '24 17:04 karpathy

Thx!

Apr 24 '24 03:04 kaizizzzzzz