Potential Issue Regarding Profiling

Open cspr2333 opened this issue 2 months ago • 0 comments

Dear maintainers,

Recently I attempted using the profiling workflow in the vidur project and collect profiling data on AWS EC2 instances. I experimented with the P5 48X which has 8X H100 connected using DGX with 8 GPUs for CodeLlama-34b-Instruct-hf. The code I used are vidur main branch and sarathi-serve vidur branch. However, the profiling results I got differ significantly from the ones in the provided data folder.

I have attached my collected data. I noticed several differences and potential issues.

New profiled data uses flashinfer, while the reference uses flash_attention.
New profiling data has additional columns for kv_cache_save.

Using the profiling data, vidur's prediction varies significantly from using the reference data. Could you please help me understand the correct profiling workflow?

New_H100_codellama_CodeLlama-34b-Instruct-hf_attention.csv

Thank you for your help.

Oct 20 '25 00:10 cspr2333