stable-baselines3 Fix tests for mps support

Description

closes https://github.com/DLR-RM/stable-baselines3/issues/914

When i started on the base branch feat/mps-support there were 45 failing tests that i now consider fixed, a few things to note:

in most cases i added a check (if mps device is available then i have to apply various casting to make sure tensors are float32 and remain float32) not sure if this approach is correct but happy to change it to something else that also works
i decided to skip test_float64_action_space tests entirely since float64 is not supported
this test test_save_load[True-SAC] only fails when running the full-suite or running all test_save_load tests (make pytest or python3 -m pytest -v -k 'test_save_load') if instead i run the the single breaking test (python3 -m pytest -v -k 'test_save_load[True-SAC]') then it passes 🤷‍♂️ i also run the test file in pycharm and it passes there too so i'm not sure what the issue is, i can add the stacktace of the failing test in a comment if needed
i'm not sure about a few things regarding this template, i think these are not breaking changes but for example i force a cast in vec_normalize:normalize_reward that maybe is considered breaking?
i also looked into the changelog but i couldnt figure out how to edit it

Here the full list of fixed tests

[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[False-PPO] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[False-A2C] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[False-DQN] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[False-DDPG] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[False-SAC] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[False-TD3] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[True-PPO] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[True-A2C] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[True-DQN] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[True-DDPG] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[True-SAC] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_dict_env.py::test_dict_vec_framestack[True-TD3] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_envs.py::test_bit_flipping[kwargs1] - OverflowError: Python integer 128 out of bounds for int8
[x] FAILED tests/test_envs.py::test_bit_flipping[kwargs2] - OverflowError: Python integer 255 out of bounds for int8
[x] FAILED tests/test_envs.py::test_bit_flipping[kwargs3] - OverflowError: Python integer 255 out of bounds for int8
[x] FAILED tests/test_her.py::test_her[True-SAC] - OverflowError: Python integer 255 out of bounds for int8
[x] FAILED tests/test_her.py::test_her[True-TD3] - OverflowError: Python integer 255 out of bounds for int8
[x] FAILED tests/test_her.py::test_her[True-DDPG] - OverflowError: Python integer 255 out of bounds for int8
[x] FAILED tests/test_her.py::test_her[True-DQN] - OverflowError: Python integer 255 out of bounds for int8
[x] FAILED tests/test_her.py::test_multiprocessing[True-TD3] - EOFError
[x] FAILED tests/test_her.py::test_multiprocessing[True-DQN] - EOFError
[x] FAILED tests/test_train_eval_mode.py::test_td3_train_with_batch_norm - AssertionError: assert ~tensor(True, device='mps:0')
[x] FAILED tests/test_vec_normalize.py::test_get_original - AssertionError: assert dtype('float32') == dtype('float64')
[x] FAILED tests/test_vec_normalize.py::test_get_original_dict - AssertionError: assert dtype('float32') == dtype('float64')
[ ] FAILED tests/test_her.py::test_save_load[True-SAC] - ValueError: Expected parameter scale (Tensor of shape (64, 4)) of distribution Normal(loc: torch.Size([64, 4]), scale: torch.Size([64, 4])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:

Unsupported tests fixed by skipping

[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space0-obs_space1-SAC] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space0-obs_space1-TD3] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space0-obs_space1-PPO] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space0-obs_space1-DDPG] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space0-obs_space1-A2C] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space0-obs_space3-SAC] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space0-obs_space3-TD3] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space0-obs_space3-PPO] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space0-obs_space3-DDPG] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space0-obs_space3-A2C] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space1-obs_space1-SAC] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space1-obs_space1-TD3] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space1-obs_space1-PPO] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space1-obs_space1-DDPG] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space1-obs_space1-A2C] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space1-obs_space3-SAC] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space1-obs_space3-TD3] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space1-obs_space3-PPO] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space1-obs_space3-DDPG] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
[x] FAILED tests/test_spaces.py::test_float64_action_space[action_space1-obs_space3-A2C] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Motivation and Context

[ ] I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation (update in the documentation)

Checklist

[x] I've read the CONTRIBUTION guide (required)
[ ] I have updated the changelog accordingly (required).
[ ] My change requires a change to the documentation.
[x] I have updated the tests accordingly (required for a bug fix or a new feature).
[ ] I have updated the documentation accordingly.
[ ] I have opened an associated PR on the SB3-Contrib repository (if necessary)
[ ] I have opened an associated PR on the RL-Zoo3 repository (if necessary)
[x] I have reformatted the code using make format (required)
[x] I have checked the codestyle using make check-codestyle and make lint (required)
[ ] I have ensured make pytest and make type both pass. (required)
[x] I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

Sep 14 '24 16:09 deathcoder

Hello, thanks for having a look at that. Apart from some tests failing, does the algorithms work in normal conditions? (for instance PPO("MlpPolicy", "Pendulum-v1", device="mps").learn(10_000))

(In theory, if pytorch supports MPS properly, you would only need to specify the device)

Sep 18 '24 12:09 araffin

hey 👋 yes that works, i have also tested A2C both on this branch, i'm still a beginner in this so i cant really say if all advanced use cases also work, but i think having the tests passing is a good indicator

Sep 18 '24 15:09 deathcoder

I think most issues are related to numpy v2, and should be fixed in https://github.com/DLR-RM/stable-baselines3/pull/2041 too.

Nov 18 '24 14:11 araffin