`max_batch_size` argument in `ModelArgs`
I'm just curious what the max_batch_size argument does in ModelArgs: https://github.com/pytorch/torchtitan/blob/d2a4904f58accc683c17c66a360026cb3c8109af/torchtitan/models/llama/model.py#L32
A quick search suggests that it doesn't actually seem to be used anywhere else in the code base, so I'm wondering if it might be superfluous.
I think this argument currently serve as a placeholder and may be used in future. What do you think? @lessw2020 @tianyu-l
I think it was copied from the original reference Llama implementation, which was meant for inference (code) and the max_batch_size was used for the KV cache.
We should probably remove it.
#585