pymatgen icon indicating copy to clipboard operation
pymatgen copied to clipboard

Revert to `pytorch` installation with `uv` in CI

Open DanielYang59 opened this issue 1 year ago • 6 comments

Summary

Revert to install torch with uv

Also link to #4063, pip install torch is taking up 70% of the total dep install time

Revert to pytorch installation with uv in CI,

  • [x] It appears https://github.com/astral-sh/uv/issues/1921 has been resolved.
  • [ ] The following intermittent error causes false negative CI failures, reported in https://github.com/astral-sh/uv/issues/2586:
error: Failed to download distributions
  Caused by: Failed to fetch wheel: torch==2.2.1
  Caused by: Failed to extract archive
  Caused by: error decoding response body
  Caused by: request or response body error
  Caused by: error reading a body from connection
  Caused by: end of file before message length reached
  • [ ] Also https://github.com/astral-sh/uv/issues/4402.

DanielYang59 avatar May 13 '24 08:05 DanielYang59

Looks like we still have the issue where uv cannot install torch randomly. I would report this to uv soon.

To fix this, we might need to install torch and explicitly skip torch for uv.

Before this PR, even after torch is already successfully installed by pip, calling uv pip install torch again would still fail (therefore the random errors we have seen).

DanielYang59 avatar May 13 '24 11:05 DanielYang59

we're getting

pytest: command not found

which is not a torch related error message?

strange that it only happens on the 1st runner. maybe a race condition? we could try python -m pytest ...

janosh avatar May 13 '24 11:05 janosh

we're getting

pytest: command not found

which is not a torch related error message?

If you look more closely into the log, the failure actually comes from the "Install pymatgen and dependencies" stage (for some reason it still passed).

image

strange that it only happens on the 1st runner. maybe a race condition? we could try python -m pytest ...

Not sure. I thought it just fails randomly...No idea why. I don't think it has anything to do with the split itself though.

Failed at split 1

Failed at split 5

Failed at split 8

DanielYang59 avatar May 13 '24 12:05 DanielYang59

at least it's no longer failing due to timeouts. looks like now we're hitting this issue https://github.com/astral-sh/uv/issues/2586

janosh avatar May 13 '24 12:05 janosh

at least it's no longer failing due to timeouts. looks like now we're hitting this issue astral-sh/uv#2586

Look like this one. I guess we would need to still install torch with pip, and somehow prevent uv from trying to install torch later (not sure how to achieve this though, as torch is not a direct dependency of pymatgen, but chgnet).

Maybe we could try to install everything but [optional] with uv, and then use uv to install [optional] only? I think it's really bad, as the package itself would be installed twice, there should be a better way that I'm not aware of.

Or, just fall back to slow but reliable pip altogether?

DanielYang59 avatar May 13 '24 12:05 DanielYang59

Also link to #4063, pip install torch is taking up 70% of the total dep install time

DanielYang59 avatar Sep 19 '24 03:09 DanielYang59

Closing this one in #4100, reason: currently chgnet and matgl doesn't support Python 3.13, so torch would not be required in CI for now.

DanielYang59 avatar Nov 21 '24 03:11 DanielYang59