Jia Ning
Jia Ning
I think there are 2 main problems in current `clean_copyright_comments` function https://github.com/togethercomputer/RedPajama-Data/blob/567ac9a0927c6dd3a2bf7e880de191239acfc308/data_prep/github/github_clean_dedup_local.py#L27. First, It cannot remove the copyright successfully in the following C-style code because of the early return in...
Great work! Have you implemented NUCLEUS SAMPLING mentioned in paper? Thank you.
https://github.com/ParadoxZW/LLaVA-UHD-Better/blob/main/llava_uhd/adapt_llava.py#L136-L138 这里由于The first token is for CLS,是不是需要把 ```python m[:w * h] = True ``` 改成 ```python m[:w * h+1] = True ```