lightning-thunder icon indicating copy to clipboard operation
lightning-thunder copied to clipboard

[Regressions] ThunderFX is slower than 2 weeks ago for 2 models

Open wprazuch opened this issue 1 year ago • 2 comments

🐛 Bug

Recently found regressions: Screenshot 2024-11-27 at 10 03 14

To Reproduce

All parameters to benchmark_litgpt.py are visible in the attached image.

Environment

Tested on pjnl-20241122 (as in the Latest image date in the screenshot).

system.device_product_name DGXH100 system.gpu_driver_version 535.129.03 libraries.cuda 12.6.3.001 libraries.pip.lightning 2.4.0.dev20240728 libraries.pip.lightning-thunder 0.2.0.dev0 libraries.pip.lightning-utilities 0.11.9 libraries.pip.litgpt 0.4.11 libraries.pip.nvfuser 0.2.23+gitb5e5182 libraries.pip.pytorch-lightning 2.4.0 libraries.pip.torch 2.6.0a0+gitecf3bae libraries.pip.torchao 0.6.1 libraries.pip.torchmetrics 1.6.0 libraries.pip.torchvision 0.19.0a0+d23a6e1

wprazuch avatar Nov 27 '24 09:11 wprazuch

To my mind, this seems to be fundamentally "memory use" and not "compute perf" if the batch size needed to be lowered.

t-vi avatar Dec 02 '24 19:12 t-vi

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 16 '25 05:04 stale[bot]