[WIP] Remove functional calls if possible

Open vmoens opened this issue 1 year ago • 2 comments

May 02 '24 16:05 vmoens

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2153

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:x: 11 New Failures, 1 Unrelated Failure

As of commit ab1deb9467f6fa9dcf9b109fe352328c8306abbb with merge base 5b9cb440f7f507948b0431077c36706a99100b78 ():

NEW FAILURES - The following jobs have failed:

Continuous Benchmark (PR) / GPU Pytest benchmark (gh) Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
Habitat Tests on Linux / tests (3.9, 12.1) / linux-job (gh) RuntimeError: Command docker exec -t c95c7659563513b9b300e03fe187a71132585eb29babed8bc51bc76f5db59984 /exec failed with exit code 139
Lint / python-source-and-configs / linux-job (gh)
Unit-tests on Linux / tests-cpu (3.10) / linux-job (gh) test/test_cost.py::TestPPO::test_ppo_shared_seq[True-device0-td_lambda-KLPENPPOLoss]
Unit-tests on Linux / tests-cpu (3.11) / linux-job (gh) test/test_cost.py::TestPPO::test_ppo_shared_seq[True-device0-td_lambda-KLPENPPOLoss]
Unit-tests on Linux / tests-cpu (3.8) / linux-job (gh) test/test_cost.py::TestPPO::test_ppo_shared_seq[True-device0-td_lambda-KLPENPPOLoss]
Unit-tests on Linux / tests-cpu (3.9) / linux-job (gh) test/test_cost.py::TestPPO::test_ppo_shared_seq[True-device0-td_lambda-KLPENPPOLoss]
Unit-tests on Linux / tests-gpu (3.10, 12.1) / linux-job (gh) test/test_cost.py::TestPPO::test_ppo_shared_seq[True-device0-td_lambda-KLPENPPOLoss]
Unit-tests on Linux / tests-optdeps (3.10, 12.1) / linux-job (gh) test/test_cost.py::TestPPO::test_ppo_shared_seq[True-device0-td_lambda-KLPENPPOLoss]
Unit-tests on Linux / tests-stable-gpu (3.10, 11.8) / linux-job (gh) test/test_cost.py::TestPPO::test_ppo_shared_seq[True-device0-td_lambda-KLPENPPOLoss]
Unit-tests on Windows / unittests-cpu / windows-job (gh) test/test_cost.py::TestPPO::test_ppo_shared_seq[True-device0-td_lambda-KLPENPPOLoss]

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job (gh) (trunk failure) test/test_rb.py::TestRBMultidim::test_rb_multidim_collector[env_device0-transform1-sampler_cls4-rbtype1-LazyTensorStorage-TensorDictRoundRobinWriter]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

May 02 '24 16:05 pytorch-bot[bot]

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}17$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	55.4556ms	54.7651ms	18.2598 Ops/s	17.9963 Ops/s	$\color{#35bf28}+1.46\%$
test_sync	35.7547ms	29.9739ms	33.3623 Ops/s	31.8852 Ops/s	$\color{#35bf28}+4.63\%$
test_async	46.0437ms	27.8692ms	35.8820 Ops/s	34.9305 Ops/s	$\color{#35bf28}+2.72\%$
test_simple	0.4154s	0.3556s	2.8123 Ops/s	2.9572 Ops/s	$\color{#d91a1a}-4.90\%$
test_transformed	0.5023s	0.4958s	2.0170 Ops/s	2.0375 Ops/s	$\color{#d91a1a}-1.01\%$
test_serial	1.2841s	1.2309s	0.8124 Ops/s	0.8192 Ops/s	$\color{#d91a1a}-0.83\%$
test_parallel	1.0955s	1.0407s	0.9609 Ops/s	0.9748 Ops/s	$\color{#d91a1a}-1.43\%$
test_step_mdp_speed[True-True-True-True-True]	0.2509ms	22.5147μs	44.4155 KOps/s	46.0395 KOps/s	$\color{#d91a1a}-3.53\%$
test_step_mdp_speed[True-True-True-True-False]	57.3670μs	13.4934μs	74.1105 KOps/s	74.8964 KOps/s	$\color{#d91a1a}-1.05\%$
test_step_mdp_speed[True-True-True-False-True]	56.4020μs	12.8028μs	78.1076 KOps/s	79.7115 KOps/s	$\color{#d91a1a}-2.01\%$
test_step_mdp_speed[True-True-True-False-False]	40.3650μs	7.9268μs	126.1539 KOps/s	130.1249 KOps/s	$\color{#d91a1a}-3.05\%$
test_step_mdp_speed[True-True-False-True-True]	66.4040μs	23.2470μs	43.0163 KOps/s	43.6159 KOps/s	$\color{#d91a1a}-1.37\%$
test_step_mdp_speed[True-True-False-True-False]	48.9620μs	14.6311μs	68.3476 KOps/s	68.2905 KOps/s	$\color{#35bf28}+0.08\%$
test_step_mdp_speed[True-True-False-False-True]	44.4430μs	14.0615μs	71.1164 KOps/s	72.2339 KOps/s	$\color{#d91a1a}-1.55\%$
test_step_mdp_speed[True-True-False-False-False]	40.3460μs	9.0937μs	109.9657 KOps/s	111.5252 KOps/s	$\color{#d91a1a}-1.40\%$
test_step_mdp_speed[True-False-True-True-True]	67.3050μs	24.3803μs	41.0167 KOps/s	41.2166 KOps/s	$\color{#d91a1a}-0.48\%$
test_step_mdp_speed[True-False-True-True-False]	39.0620μs	16.1107μs	62.0705 KOps/s	62.5322 KOps/s	$\color{#d91a1a}-0.74\%$
test_step_mdp_speed[True-False-True-False-True]	35.6260μs	14.0153μs	71.3506 KOps/s	72.0746 KOps/s	$\color{#d91a1a}-1.00\%$
test_step_mdp_speed[True-False-True-False-False]	39.3040μs	9.0971μs	109.9255 KOps/s	111.5148 KOps/s	$\color{#d91a1a}-1.43\%$
test_step_mdp_speed[True-False-False-True-True]	56.4850μs	25.5796μs	39.0937 KOps/s	39.4753 KOps/s	$\color{#d91a1a}-0.97\%$
test_step_mdp_speed[True-False-False-True-False]	85.5190μs	17.1769μs	58.2179 KOps/s	58.2784 KOps/s	$\color{#d91a1a}-0.10\%$
test_step_mdp_speed[True-False-False-False-True]	47.8190μs	15.0709μs	66.3532 KOps/s	66.7796 KOps/s	$\color{#d91a1a}-0.64\%$
test_step_mdp_speed[True-False-False-False-False]	36.5180μs	10.2049μs	97.9917 KOps/s	98.8998 KOps/s	$\color{#d91a1a}-0.92\%$
test_step_mdp_speed[False-True-True-True-True]	50.1440μs	24.2888μs	41.1713 KOps/s	40.9650 KOps/s	$\color{#35bf28}+0.50\%$
test_step_mdp_speed[False-True-True-True-False]	35.0350μs	15.9323μs	62.7657 KOps/s	62.8194 KOps/s	$\color{#d91a1a}-0.09\%$
test_step_mdp_speed[False-True-True-False-True]	53.8300μs	16.3012μs	61.3451 KOps/s	62.0281 KOps/s	$\color{#d91a1a}-1.10\%$
test_step_mdp_speed[False-True-True-False-False]	50.5940μs	10.2311μs	97.7417 KOps/s	98.7680 KOps/s	$\color{#d91a1a}-1.04\%$
test_step_mdp_speed[False-True-False-True-True]	49.8230μs	25.5709μs	39.1069 KOps/s	39.0100 KOps/s	$\color{#35bf28}+0.25\%$
test_step_mdp_speed[False-True-False-True-False]	56.3560μs	17.1847μs	58.1913 KOps/s	58.7477 KOps/s	$\color{#d91a1a}-0.95\%$
test_step_mdp_speed[False-True-False-False-True]	48.0690μs	17.4980μs	57.1494 KOps/s	57.5131 KOps/s	$\color{#d91a1a}-0.63\%$
test_step_mdp_speed[False-True-False-False-False]	36.4590μs	11.4604μs	87.2570 KOps/s	87.8830 KOps/s	$\color{#d91a1a}-0.71\%$
test_step_mdp_speed[False-False-True-True-True]	55.4140μs	26.7733μs	37.3507 KOps/s	37.4490 KOps/s	$\color{#d91a1a}-0.26\%$
test_step_mdp_speed[False-False-True-True-False]	51.9670μs	18.3875μs	54.3849 KOps/s	54.4259 KOps/s	$\color{#d91a1a}-0.08\%$
test_step_mdp_speed[False-False-True-False-True]	41.3980μs	17.5625μs	56.9395 KOps/s	57.7186 KOps/s	$\color{#d91a1a}-1.35\%$
test_step_mdp_speed[False-False-True-False-False]	34.7450μs	11.5790μs	86.3632 KOps/s	88.3754 KOps/s	$\color{#d91a1a}-2.28\%$
test_step_mdp_speed[False-False-False-True-True]	40.3460μs	28.1651μs	35.5049 KOps/s	35.5828 KOps/s	$\color{#d91a1a}-0.22\%$
test_step_mdp_speed[False-False-False-True-False]	51.1960μs	19.3728μs	51.6188 KOps/s	51.7266 KOps/s	$\color{#d91a1a}-0.21\%$
test_step_mdp_speed[False-False-False-False-True]	57.2070μs	19.9244μs	50.1898 KOps/s	54.9614 KOps/s	$\textbf{\color{#d91a1a}-8.68\%}$
test_step_mdp_speed[False-False-False-False-False]	34.1540μs	12.4597μs	80.2588 KOps/s	80.6955 KOps/s	$\color{#d91a1a}-0.54\%$
test_values[generalized_advantage_estimate-True-True]	10.9546ms	9.5960ms	104.2100 Ops/s	103.8816 Ops/s	$\color{#35bf28}+0.32\%$
test_values[vec_generalized_advantage_estimate-True-True]	37.4478ms	33.6935ms	29.6793 Ops/s	29.7473 Ops/s	$\color{#d91a1a}-0.23\%$
test_values[td0_return_estimate-False-False]	0.2659ms	0.1908ms	5.2402 KOps/s	5.9954 KOps/s	$\textbf{\color{#d91a1a}-12.60\%}$
test_values[td1_return_estimate-False-False]	24.4119ms	23.7777ms	42.0561 Ops/s	42.2581 Ops/s	$\color{#d91a1a}-0.48\%$
test_values[vec_td1_return_estimate-False-False]	36.1104ms	33.8428ms	29.5484 Ops/s	29.6359 Ops/s	$\color{#d91a1a}-0.30\%$
test_values[td_lambda_return_estimate-True-False]	38.0013ms	34.4436ms	29.0330 Ops/s	28.8471 Ops/s	$\color{#35bf28}+0.64\%$
test_values[vec_td_lambda_return_estimate-True-False]	35.3810ms	33.7693ms	29.6127 Ops/s	29.6596 Ops/s	$\color{#d91a1a}-0.16\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	9.2586ms	8.4114ms	118.8856 Ops/s	118.6391 Ops/s	$\color{#35bf28}+0.21\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	2.5975ms	1.9827ms	504.3711 Ops/s	502.3031 Ops/s	$\color{#35bf28}+0.41\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	0.4262ms	0.3552ms	2.8154 KOps/s	2.8048 KOps/s	$\color{#35bf28}+0.38\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	48.4456ms	47.1224ms	21.2213 Ops/s	21.5130 Ops/s	$\color{#d91a1a}-1.36\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	3.7798ms	3.0832ms	324.3426 Ops/s	318.8762 Ops/s	$\color{#35bf28}+1.71\%$
test_dqn_speed	1.7018ms	1.2003ms	833.1130 Ops/s	724.8521 Ops/s	$\textbf{\color{#35bf28}+14.94\%}$
test_ddpg_speed	8.8916ms	2.7003ms	370.3276 Ops/s	338.4648 Ops/s	$\textbf{\color{#35bf28}+9.41\%}$
test_sac_speed	9.4292ms	8.6606ms	115.4653 Ops/s	110.7739 Ops/s	$\color{#35bf28}+4.24\%$
test_redq_speed	19.4893ms	13.8296ms	72.3086 Ops/s	72.4120 Ops/s	$\color{#d91a1a}-0.14\%$
test_redq_deprec_speed	91.9748ms	15.0586ms	66.4074 Ops/s	67.2561 Ops/s	$\color{#d91a1a}-1.26\%$
test_td3_speed	9.3635ms	8.7362ms	114.4664 Ops/s	116.1373 Ops/s	$\color{#d91a1a}-1.44\%$
test_cql_speed	39.0048ms	37.0609ms	26.9826 Ops/s	27.1633 Ops/s	$\color{#d91a1a}-0.67\%$
test_a2c_speed	9.5095ms	7.6315ms	131.0352 Ops/s	132.9864 Ops/s	$\color{#d91a1a}-1.47\%$
test_ppo_speed	8.5548ms	7.5554ms	132.3556 Ops/s	129.6294 Ops/s	$\color{#35bf28}+2.10\%$
test_reinforce_speed	7.3069ms	6.7506ms	148.1347 Ops/s	148.0190 Ops/s	$\color{#35bf28}+0.08\%$
test_iql_speed	34.4872ms	33.0340ms	30.2718 Ops/s	30.1238 Ops/s	$\color{#35bf28}+0.49\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	2.7567ms	2.2646ms	441.5808 Ops/s	450.9811 Ops/s	$\color{#d91a1a}-2.08\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	91.2490ms	0.5674ms	1.7623 KOps/s	1.9681 KOps/s	$\textbf{\color{#d91a1a}-10.46\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.6881ms	0.4765ms	2.0988 KOps/s	2.0714 KOps/s	$\color{#35bf28}+1.32\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	3.7598ms	2.3923ms	417.9997 Ops/s	457.5981 Ops/s	$\textbf{\color{#d91a1a}-8.65\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.7993ms	0.5090ms	1.9648 KOps/s	1.9769 KOps/s	$\color{#d91a1a}-0.62\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	3.5250ms	0.4731ms	2.1137 KOps/s	2.0677 KOps/s	$\color{#35bf28}+2.23\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000]	2.0362ms	1.2554ms	796.5335 Ops/s	821.5449 Ops/s	$\color{#d91a1a}-3.04\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000]	4.3219ms	1.1905ms	839.9755 Ops/s	865.6157 Ops/s	$\color{#d91a1a}-2.96\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	2.9147ms	2.3802ms	420.1406 Ops/s	437.4498 Ops/s	$\color{#d91a1a}-3.96\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	1.2738ms	0.6201ms	1.6125 KOps/s	1.5851 KOps/s	$\color{#35bf28}+1.73\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.8049ms	0.6076ms	1.6458 KOps/s	1.6916 KOps/s	$\color{#d91a1a}-2.70\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	3.8974ms	2.4891ms	401.7559 Ops/s	455.4957 Ops/s	$\textbf{\color{#d91a1a}-11.80\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	1.1812ms	0.5307ms	1.8844 KOps/s	1.9929 KOps/s	$\textbf{\color{#d91a1a}-5.45\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.6743ms	0.5008ms	1.9967 KOps/s	2.0735 KOps/s	$\color{#d91a1a}-3.70\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	3.8382ms	2.5319ms	394.9597 Ops/s	457.7441 Ops/s	$\textbf{\color{#d91a1a}-13.72\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.6034ms	0.5189ms	1.9271 KOps/s	2.0314 KOps/s	$\textbf{\color{#d91a1a}-5.13\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	4.1133ms	0.5011ms	1.9955 KOps/s	2.0327 KOps/s	$\color{#d91a1a}-1.83\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.3074ms	2.7888ms	358.5832 Ops/s	409.8194 Ops/s	$\textbf{\color{#d91a1a}-12.50\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	0.8438ms	0.6527ms	1.5320 KOps/s	1.6124 KOps/s	$\color{#d91a1a}-4.99\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.7849ms	0.6219ms	1.6081 KOps/s	1.6881 KOps/s	$\color{#d91a1a}-4.74\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.1243s	8.4209ms	118.7525 Ops/s	125.2835 Ops/s	$\textbf{\color{#d91a1a}-5.21\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	25.4844ms	13.6649ms	73.1804 Ops/s	82.5338 Ops/s	$\textbf{\color{#d91a1a}-11.33\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	1.9851ms	1.1933ms	837.9920 Ops/s	901.2549 Ops/s	$\textbf{\color{#d91a1a}-7.02\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.1172s	6.0956ms	164.0523 Ops/s	177.0910 Ops/s	$\textbf{\color{#d91a1a}-7.36\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	0.1282s	15.3088ms	65.3219 Ops/s	70.9078 Ops/s	$\textbf{\color{#d91a1a}-7.88\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	4.7169ms	1.2187ms	820.5693 Ops/s	826.7259 Ops/s	$\color{#d91a1a}-0.74\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	0.1140s	6.4062ms	156.0988 Ops/s	166.8237 Ops/s	$\textbf{\color{#d91a1a}-6.43\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	16.6213ms	13.5989ms	73.5355 Ops/s	81.1079 Ops/s	$\textbf{\color{#d91a1a}-9.34\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	5.6329ms	1.5905ms	628.7460 Ops/s	662.7629 Ops/s	$\textbf{\color{#d91a1a}-5.13\%}$

May 02 '24 16:05 github-actions[bot]

[WIP] Remove functional calls if possible

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2153

:x: 11 New Failures, 1 Unrelated Failure

$\color{#D29922}\textsf{\Large&#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}17$.

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests