VILA icon indicating copy to clipboard operation
VILA copied to clipboard

Sequence Parallel logic -- why are you padding with '#'?

Open orrzohar opened this issue 7 months ago • 0 comments

Hi,

When you do Sequence Paralle -- you are padding with token id 2 = '#'

https://github.com/NVlabs/VILA/blob/2b43308f25e63161a172fe9a38e3a04e2fcd12ef/llava/data/dataset.py#L1372-L1389

Could you let me know why you are padding with this instead of the self.tokenizer.pad_token_id?

Why was '#' selected?

does it just not matter because you are adding the padding to the right and then mask it with the label?

Best, Orr

orrzohar avatar Mar 10 '25 23:03 orrzohar