sglang RuntimeError in llava image encoding

When running llava 1.6 mistral 7b, i get this error:

RuntimeError in llava image encoding: The expanded size of the tensor (0) must match the existing size (2438) at non-singleton dimension 0.  Target sizes: [0, 4096].  Tensor sizes: [2438, 4096]
torch.Size([2758, 4096])
0 -1

Note the sizes 2438 and 2758 changes This error happens randomly and is not specific to data. Removing image input removes this error too.

Mar 10 '24 16:03 aliencaocao

Same bug raised by using gen(choices=[...]) with image input, as mentioned in #222 .

Mar 11 '24 05:03 TideDra

this bug is not related to using gen choices or not, but happens universally as long as i have image as input

Mar 11 '24 06:03 aliencaocao

Same bug raised by using gen(choices=[...]) with image input, as mentioned in #222 .

that's because when generating the "logprob_start_len" variable used to get the start position of the choices, sglang does not add the necessary position shift when inserting image tokens into the input sequence.

https://github.com/sgl-project/sglang/blob/main/python/sglang/backend/runtime_endpoint.py#L217

For now, I add a temp fix to make it work as follows:

       # Compute logprob
        data = {
            "text": [s.text_ + c for c in choices],
            "sampling_params": {"max_new_tokens": 0},
            "return_logprob": True,
            "logprob_start_len": max(prompt_len - 1, 0), 
            # should be prompt_len-1 here i think, otherwise the normed_logp will be wrong
            "return_text_in_logprobs": True,
        }
        self._add_images(s, data)

        if s.images_:  # only support one image
            # TODO: This is a very naive way to shift the logprob_start_len
            # maybe in future we should directly modify `prompt_tokens` variable
            # to take the added image tokens into account
            data["logprob_start_len"] += 576 - 1

Apr 11 '24 07:04 lockon-n

I'm also running into the problem. It seems to be only a problem with the newer 1.6 LLaVA models.

Apr 30 '24 14:04 lukashelff

Have you yet found out what exactly caused it and how to solve it. I do not use regex, thus, the gen choices should not be called.

May 04 '24 18:05 lukashelff

sglang sglang copied to clipboard

RuntimeError in llava image encoding

sglang
sglang copied to clipboard