litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

Update Azure workflow to use the latest stable PyTorch version

Open rasbt opened this issue 1 year ago • 1 comments

This updates the GPU workflow to allow us testing on the latest stable PyTorch version. As discussed with @Andrei-Aksionov in #1585 I think the nightly dev build can be nice as a heads-up for future changes, but we actually need to be testing the version that is actually being used, and that is the latest stable version, which is installed via the pyproject.toml. Let's update the azure version to that. We can always test things on the dev version locally every few weeks or so. Changes in the dev version should not affect users until the stable version launches anyways.

rasbt avatar Jul 16 '24 19:07 rasbt

I wonder if it is possible to skip tests more elegantly than via the pytest CLI. I.e., I see we have the following @RunIf selector in the tests, like

@RunIf(min_cuda_gpus=1, thunder=True)
def test_setup_already_traced():
    import thunder

    device = torch.device("cuda")
    x = torch.randn(1, 1, device=device)
    model = torch.nn.Linear(1, 2, bias=False, device=device)

    strategy = ThunderDDPStrategy()

    tmodel = thunder.jit(model)
    tmodel(x)
    with pytest.raises(RuntimeError, match="already called"):
        strategy.setup_module(tmodel)

But I don't know where thunder=true is set tbh.

rasbt avatar Jul 17 '24 20:07 rasbt