freq

Results 7 issues of freq

How to evaluate on llama3-8b-instruct? Please add the function, thanks!

enhancement

Add long context evaluation benchmarks such as LongBench and LEval.

help wanted
feature request

When calculating FVD, FID and IS scores, how many fake videos (sample.mp4 ) need to be generated? Whether you use all real video frames when calculating these scores? CUDA_VISIBLE_DEVICES=gpu_id python...

### Feature request / 功能建议 how to use a local LLM to evaluate prediction quality? For example, Llama-3-70B-Instruct? ### Motivation / 动机 how to use a local LLM to evaluate...