torchtune
torchtune copied to clipboard
Failing tests on MPS
Several of our tests fail due to precision differences between MPS and CUDA/CPU falling outside of numerical tolerances on our tests:
$ pytest tests
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched[prompt_tokens_batched-generation_model_no_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched[prompt_tokens_batched-generation_model_kv_cache_batched] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched[prompt_tokens_batched_left_padded-generation_model_no_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched[prompt_tokens_batched_left_padded-generation_model_kv_cache_batched] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens[generation_model_no_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens[generation_model_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched_uneven_stopping[generation_model_no_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched_uneven_stopping[generation_model_kv_cache_batched] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched_uneven_stopping_left_padded[generation_model_no_kv_cache] - assert False
FAILED tests/torchtune/generation/test_generation.py::TestGenerate::test_stop_tokens_batched_uneven_stopping_left_padded[generation_model_kv_cache_batched] - assert False
FAILED tests/torchtune/models/llama3_1/test_position_embeddings.py::TestLlama3ScaledRoPE::test_forward - AssertionError: actual: -83.30372619628906, expected: -83.15229797363281
FAILED tests/torchtune/models/llama3_1/test_position_embeddings.py::TestLlama3ScaledRoPE::test_forward_with_curr_pos - AssertionError: actual: -83.30372619628906, expected: -83.15229797363281
FAILED tests/torchtune/models/llama3_1/test_position_embeddings.py::TestLlama3ScaledRoPE::test_forward_with_2d_pos_ids - AssertionError: actual: -83.30372619628906, expected: -83.15229797363281
FAILED tests/torchtune/modules/test_position_embeddings.py::TestRotaryPositionEmbedding::test_forward - AssertionError: actual: 2165.59619140625, expected: 2165.705322265625
FAILED tests/torchtune/modules/test_position_embeddings.py::TestRotaryPositionEmbedding::test_forward_with_curr_pos - AssertionError: actual: 2165.59619140625, expected: 2165.705322265625
FAILED tests/torchtune/modules/test_position_embeddings.py::TestRotaryPositionEmbedding::test_forward_with_packed_pos - AssertionError: actual: 2165.59619140625, expected: 2165.705322265625
FAILED tests/torchtune/modules/test_position_embeddings.py::TestPhi3RotaryPositionalEmbeddings::test_forward - AssertionError: actual: -381.06915283203125, expected: -381.06201171875
FAILED tests/torchtune/modules/test_transformer_decoder.py::TestTransformerCrossAttentionLayer::test_forward - AssertionError: actual: 1.7740381956100464, expected: 1.7762000560760498
FAILED tests/torchtune/modules/test_transformer_decoder.py::TestTransformerDecoder::test_forward - AssertionError: actual: 20.48008918762207, expected: 20.479999542236328
$ pytest tests -m integration_test
FAILED tests/recipes/test_ppo_full_finetune_single_device.py::TestPPOFullFinetuneSingleDeviceRecipe::test_loss - AssertionError: Scalars are not close!
FAILED tests/recipes/dev/test_generate_v2.py::TestGenerateV2::test_llama2_generate_results - AssertionError: assert 'Country maior Connection Kohćutsójcustomulas Sometimes Security' in 'INFO torchtune.utils._logging:_logging.py:101 Running InferenceRecipe with resolv...
It's likely that the PPO integration test fails since the recipe relies on a generation step.