Yucheng Li

Results 14 issues of Yucheng Li

This token dropping method, as indicated by the citation, is based on the V-MoE method. How this different from the recent MoD? It look like they very similar techniques.

The word senses clustering shown in the paper is very nice. Is the code been provided in this repo?

Great job, I found two problems when trying to reproduce the paper's results. 1. The same positiona embedding was used for all context memory units as explained in the paper....

I'd like to ask the base LLM of the following LongVILA checkpoint: - `Efficient-Large-Model/Llama-3-LongVILA-8B-128Frames` - `Efficient-Large-Model/Llama-3-LongVILA-8B-256Frames` - `Efficient-Large-Model/Llama-3-LongVILA-8B-512Frames` This was named with `Llama-3`, however, as quote from the paper: It...