LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Consider changing llama3 configs to left padding
Thanks for making this repo! Really helpful for a project I'm working on.
However, when generating in a batch, there are a couple of isses. The first is the missing parameter in the generation as mentioned here: https://github.com/LLaVA-VL/LLaVA-NeXT/pull/391
The above PR isn't really an issue, moreso an ambiguity. There is an issue with the llama3 tokenizer in that it uses a left-padding scheme. Using a right padding scheme results in bad or incorrect outputs when there are uneven batches as the pad tokens are used to generate the next token (idx -1 is selected by the HF code and this is a pad token). The huggingface configs can be changed to fix this issue. For example, changing this config. The only thing that needs to be changed is tokenizer_padding_side to be left instead of right. Alternatively, one can manually change this like the following:
overwrite_config = {"tokenizer_padding_side": "left"}
llava_tokenizer, llava_model, llava_image_processor, llava_max_length = load_pretrained_model(..., overwrite_config=overwrite_config)