JJJYmmm issues

Results 18 issues of


                                            JJJYmmm

fix test.py

I try to fix test.py which mentioned in #7

Semantic Conflict about variable 'max_len'

Hi Shariatnia, thanks for your tutorial! I have some question about variable `max_len`. I see `max_len` first in class Tokenizer,I think the role of it is to limit the maximum...

There seems to be a problem in train.py. ``` total_steps = (len(trainloader) // args.batch_size + 1) * args.epoches ``` len(train_loader) is already divided by batch_size. Change it to `total_steps =...

Fix variable naming errors in EMAVectorQuantizer

Fix variable naming errors, embedding_dim and n_embed

How to distinguish sos token(default = 0) and quantified image token zero ?

Since the transformer take in the quantified image token generated by VQGAN, which codebook has indices (0~n_embed-1), and transformer’s sos token is also set to zero defaultly. Could you tell...

Confused about the TF score when calculating CIDEr

In the original paper, the TF is defined as blow. ![image](https://github.com/user-attachments/assets/f782b5c4-11c3-4f74-b6ee-8694a8bd621f) But in the code, the denominator seems to be ignored. https://github.com/tylin/coco-caption/blob/3a9afb2682141a03e1cdc02b0df6770d2c884f6f/pycocoevalcap/cider/cider_scorer.py#L124

[bug] Generation Error: rope_deltas has not the same device of cache_position

## Code snippet https://github.com/huggingface/transformers/blob/11afab19c0e4b652855f9ed7f82aa010c4f14754/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py#L1792-L1800 ## Related issue https://github.com/hiyouga/LLaMA-Factory/issues/6910 ## solution modify line 1792 to `self.rope_deltas = rope_deltas.to(cache_position.device)` ```python position_ids, rope_deltas = self.get_rope_index( input_ids, image_grid_thw, video_grid_thw, second_per_grid_ts, attention_mask, ) self.rope_deltas =...

the cutoff of multimodal input sequence

### Reminder - [x] I have read the above rules and searched the existing issues. ### Description As mention in https://github.com/hiyouga/LLaMA-Factory/issues/6844#issuecomment-2644439667, the cutoff of multimodal sequence should not remove the...

enhancement

pending