freq
freq
How to evaluate on llama3-8b-instruct? Please add the function, thanks!
Add long context evaluation benchmarks such as LongBench and LEval.
Any adjustments to the hyperparameters in pred.py?
When calculating FVD, FID and IS scores, how many fake videos (sample.mp4 ) need to be generated? Whether you use all real video frames when calculating these scores? CUDA_VISIBLE_DEVICES=gpu_id python...
### Feature request / 功能建议 how to use a local LLM to evaluate prediction quality? For example, Llama-3-70B-Instruct? ### Motivation / 动机 how to use a local LLM to evaluate...