torchtune [RFC][fix test] missing .item() in frozen nf4 test

Context

as titled. missing .item()
but there is another thing. I found the error from frozen-nf4 is larger than the one from bnb-linear

Changelog

updated nf4 test

Test plan

pytest tests/torchtune/modules/low_precision/test_nf4_linear.py

Mar 28 '24 21:03 skcoirz

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/609

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Mar 28 '24 21:03 pytorch-bot[bot]

Thanks for your contribution @skcoirz! This was definitely an oversight on my part. Though do we understand why the current test (err.item() < err_bnb) is passing in CI?

Mar 28 '24 21:03 rohan-varma

good question. Let me dig into it.

Mar 28 '24 21:03 skcoirz

@rohan-varma , found out the cause.

unit test is able to detect this test. (confirmed on my local server)
but in CI, the 2 accuracy tests are bypassed because there is no cuda. (confirmed in logging below)

looks integration test is handled in aws. I would assume we have GPU there? Let's mark them as integration_tests?

Screenshot 2024-03-28 at 2 54 09 PM https://github.com/pytorch/torchtune/actions/runs/8472914261/job/23216077574

Mar 28 '24 21:03 skcoirz

@skcoirz thanks for debugging this. Yeah this is a weird edge case. We can run this GPU test in CI but right now they are meant for more like E2E runs of fine-tuning scripts. So we can mark this test as an integration via @pytest.mark.integration_test like you suggested, but it is a bit unintuitive (e.g. it'll be our only integration test outside of tests/recipes). That said, we do need a way to catch this. Out of curiosity, @rohan-varma, why is this a CUDA test anyways? Is NF4 only supported on CUDA? Or is this more of a bnb-side thing?

Edit: alternatively we could just modify our multi-GPU CI job to run on all unit tests and recipe tests. We do have some other distributed-type unit tests, so while it might be a bit of overkill it's probably not a terrible idea to do this anyways.

Mar 28 '24 23:03 ebsmothers

We don't have NF4Linear in torchtune anymore so this PR is no longer relevant

May 21 '24 14:05 ebsmothers

torchtune torchtune copied to clipboard

[RFC][fix test] missing .item() in frozen nf4 test

Context

Changelog

Test plan

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/609

torchtune
torchtune copied to clipboard