blog icon indicating copy to clipboard operation
blog copied to clipboard

ValueError: Number of images does not match number of special image tokens in the input text. Got 256 image tokens in the text but 256 tokens from image embeddings.

Open trinh-hoang-hiep opened this issue 8 months ago • 2 comments

gemma has problem when passing input embedding to model.generate() function forced to pass **input_ids otherwise it causes this error

maybe Integer vs. Float Comparison:

When you calculate special_image_mask based on input_ids, the comparison is done between integer values, which gives exact results. When comparing on inputs_embeds, you compare float vectors from the embedding layer with the embedding vector of the special token. Since float comparisons can have floating point precision issues, sometimes some values ​​may be incorrectly identified (or have very small deviations that cause the comparison to return True or False inconsistently).

        if input_ids is None:
            special_image_mask = inputs_embeds == self.get_input_embeddings()(
                torch.tensor(self.config.image_token_index, dtype=torch.long, device=inputs_embeds.device)
            )
        else:
            special_image_mask = (input_ids == self.config.image_token_index).unsqueeze(-1)
            special_image_mask = special_image_mask.expand_as(inputs_embeds).to(inputs_embeds.device)

if using input embedding special_image_mask.sum() =tensor(655363, device='cuda:0'), if using input_ids it is equal to tensor(655360, device='cuda:0')

trinh-hoang-hiep avatar Mar 19 '25 11:03 trinh-hoang-hiep

i solved it here https://github.com/trinh-hoang-hiep/prompt-tuning-gemma3

trinh-hoang-hiep avatar Mar 21 '25 04:03 trinh-hoang-hiep

@trinh-hoang-hiep can you tell a little bit more? I am facing identical error and your link points to 404

pySilver avatar Apr 06 '25 07:04 pySilver