easybuild-easyblocks
easybuild-easyblocks copied to clipboard
create $XDG_CACHE_HOME for PyTorch tests
The path must exist or PyTorch will show errors/warnings like:
UserWarning: Specified kernel cache directory could not be created! This disables kernel caching.
Test report by @Flamefire
Overview of tested easyconfigs (in order)
- SUCCESS PyTorch-1.10.0-fosscuda-2020b.eb
Build succeeded for 1 out of 1 (1 easyconfigs in total) taurusml22 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5 See https://gist.github.com/c761c3d11bf2a0140a56aa2e933ccefd for a full test report.
Test report by @boegel
Overview of tested easyconfigs (in order)
- FAIL (build issue) PyTorch-1.10.0-foss-2021a.eb (partial log available at https://gist.github.com/39d079c7228013daf8a29747912e9e26)
Build succeeded for 0 out of 1 (1 easyconfigs in total) node3907.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 520.61.05, Python 3.6.8 See https://gist.github.com/db3002446c8f4cdc98e99d5ba0d5a7e8 for a full test report.
Test report by @boegel
Overview of tested easyconfigs (in order)
- FAIL (build issue) PyTorch-1.9.0-foss-2020b.eb (partial log available at https://gist.github.com/495345009c7c03992e6dd6d100a96e37)
Build succeeded for 0 out of 1 (1 easyconfigs in total) node3539.doduo.os - Linux RHEL 8.4, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8 See https://gist.github.com/9c5ce58f5e0f91440e4e211a565f5234 for a full test report.
I'm not sure why the 2 ECs failed for you but I'm quite certain not due to the change here which should be correct by inspection (and I guess some document may tell us that $XDG_CACHE_HOME must exist, so this fixes a bug)
Especially as the test build on PPC passed I'd assume this is ok. ;-)
@Flamefire I agree with you, but I'm being cautious here: we're very close to the next EasyBuild release, and I don't want to merge a PR last-minute which breaks the installation of PyTorch.
I wouldn't expect that making sure that $XDG_CACHE_HOME
exists causes trouble, but it does seem like the behavior is slightly different when $XDG_CACHE_HOME
does exist (kernel caching is not disabled), so it doesn't seem impossible to me that this affects a handful of tests...
My testbuild of PyTorch-1.10.0-fosscuda-2020b.eb hangs on "python -s -c from multiprocessing.resource_tracker import main;main(26)" (for 11h then I killed it...)
(with this easyblock but I do not believe that is it related) Will try again...
Same problem again without this change.
Could this now be merged?
Test report by @branfosj
Overview of tested easyconfigs (in order)
- SUCCESS PyTorch-1.10.0-foss-2021a.eb
Build succeeded for 1 out of 1 (1 easyconfigs in total) bear-pg0105u36b.bear.cluster - Linux RHEL 8.5, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8 See https://gist.github.com/f31aeda337f3cec583eed3ac1525dd8d for a full test report.
Test report by @branfosj
Overview of tested easyconfigs (in order)
- SUCCESS PyTorch-1.9.0-fosscuda-2020b.eb
Build succeeded for 1 out of 1 (1 easyconfigs in total) bear-pg0212u17a.bear.cluster - Linux RHEL 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz (broadwell), 1 x NVIDIA Tesla P100-PCIE-16GB, 470.57.02, Python 3.6.8 See https://gist.github.com/ed8b85d63e0b5a7b44b1075285fcf52b for a full test report.
Going in, thanks @Flamefire!