freq issues

Results 7 issues of


                                            freq

训练过程P R一直为0 怎么回事？数据读取正常

Could you please release your inference code?

How to evaluate on llama3-8b-instruct?

How to evaluate on llama3-8b-instruct? Please add the function, thanks!

enhancement

Add long context evaluation benchmarks such as LongBench and LEval.

help wanted

feature request

How you evaluate reasoning models like QwQ-32B, since the response time and token length is very long?

Any adjustments to the hyperparameters in pred.py?

Questions on calculating FVD, FID and IS scores

When calculating FVD, FID and IS scores, how many fake videos (sample.mp4 ) need to be generated? Whether you use all real video frames when calculating these scores? CUDA_VISIBLE_DEVICES=gpu_id python...

how to use a local LLM to evaluate prediction quality? For example, Llama-3-70B-Instruct?

### Feature request / 功能建议 how to use a local LLM to evaluate prediction quality? For example, Llama-3-70B-Instruct? ### Motivation / 动机 how to use a local LLM to evaluate...