Genesis icon indicating copy to clipboard operation
Genesis copied to clipboard

Simulation Speed Benchmark likely has significant issues leading to overstated numbers

Open StoneT2000 opened this issue 1 year ago • 2 comments

Just some important context, I have went through the Genesis code base and benchmarks and found some issues that likely lead to inflated benchmark speeds.

I've written a full report on it here: https://x.com/stone_tao/status/1870243004730225009?s=46&t=LBFTca4dqDdDCjhzaM56tA

The main issue in question is why the original speed benchmark

  1. uses the fastest physics setting when other tutorial/example code don't
  2. takes one action followed by 999 steps of no actions (which likely means the rigid body solver is early exiting or something most of the time leading to faster speed?)
  3. disables robot self collisions

image

Sharing it here so people can see and verify numbers themselves on at least the Genesis only benchmark. I look forward to improvements that address these issues! Genesis still stands as an impressive effort to integrate many physics solvers into one package.

StoneT2000 avatar Dec 21 '24 00:12 StoneT2000

EDIT: The post by @ziyanx02 that my comment here is responding to was removed, sorry if this doesn't make sense anymore.

Interesting that it runs faster on 6000Ada. But I don't know enough about Taichi to understand how it leverages GPUs compared to jax based systems or PhysX. I will just discuss 4090 results since this is what is being stated and claimed on the project website and online posts.

Whether the sim2real transfer of the double backflip policy is reported correctly I leave it up to the community to decide and agree that part at least is too early to conclude. If it works then that's a great contribution although I'm sure the same setup would work in MJX/Isaac. I'm not an expert in locomotion but I have been told that without released benchmark code of MJX/Isaac and details about training it's hard to say whether Genesis's original reported numbers are relatively accurate.

Also just to clarify terms, decimation and substeps here refers to what exactly? I do not see decimation in this codebase, only substeps. Looking at the code it sounds like substeps is the standard decimation term as I see in the scene.step function call it calls sim.step which loops substeps time over rigid_solver.substep(). Or is decimation the number of calls to scene.step before passing observations to a neural network and generating actions to send to the simulator? Understanding this will better help me understand how one should benchmark FPS during RL correctly for Genesis.

Moreover while it is possible that the proportion of decrease in performance varies between GPU to GPU, there is still a decrease, 16.4M to 7.8M (also why are there two rows and a missing entry, I didn't quite follow). Not taking random actions and disabling self collisions also do not make sense to me (and many others) and appear to be inaccurate ways to benchmark simulation speed. So conclusions can still be made about the benchmark being misleading.

Finally, your explanations do not address the figure I shared in my first comment, showing the issues with the Franka arm simulation speeds where it goes from 43M down to < 1M which is one of the two benchmarking tasks (and the one I'm concerned the most about). Those numbers corresponding directly to the claim made in Zhou Xian's original tweet/Genesis's website that Genesis runs at 43 million FPS which I find hard to believe is realistic. Regarding the substeps argument for the benchmark I already am being conservative by running the franka arm move benchmark with substeps=2 instead of substeps=4 which is recommended by the Genesis documentation to get more stable manipulation/grasps.

StoneT2000 avatar Dec 21 '24 09:12 StoneT2000

For those who didn't see my previous comment, that comment is about:

  1. Numerical results with different decrease proportions under identical settings in a different system.
  2. Argument about the + more sim accuracy that is substeps=2 in the proposed benchmark.

To avoid further time-consuming debates, we will respond to @StoneT2000 after conducting more comprehensive tests.

ziyanx02 avatar Dec 21 '24 09:12 ziyanx02

Thanks to the authors for the comprehensive report. Don't have time to verify things but the numbers in the report look more accurate when given the right context: https://placid-walkover-0cc.notion.site/genesis-performance-benchmarking?pvs=4

StoneT2000 avatar Jan 08 '25 22:01 StoneT2000