qlib icon indicating copy to clipboard operation
qlib copied to clipboard

feat: pytorch benchmarks support mps device

Open donaldkuck opened this issue 4 weeks ago • 3 comments

Add MPS support and centralize device selection

Add get_device() function in pytorch_utils.py supporting CUDA, MPS, and CPU Refactor 26 PyTorch model files to use centralized device selection Enable automatic MPS device selection on Apple Silicon devices The device selection priority is: CUDA > MPS > CPU

donaldkuck avatar Nov 26 '25 08:11 donaldkuck

FAILED model/test_general_nn.py::TestNN::test_both_dataset - RuntimeError: MPS backend out of memory (MPS allocated: 8.00 MiB, other allocations: 16.00 KiB, max allowed: 7.93 GiB). Tried to allocate 256 bytes on shared pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED test_contrib_model.py::TestAllFlow::test_0_initialize - RuntimeError: MPS backend out of memory (MPS allocated: 8.00 MiB, other allocations: 0 bytes, max allowed: 7.93 GiB). Tried to allocate 256 bytes on shared pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
= 2 failed, 51 passed, 1 skipped, 10 deselected, 23 warnings in 288.46s (0:04:48) =

I think the failure is not because of the codes.

donaldkuck avatar Nov 27 '25 10:11 donaldkuck

Hi, @donaldkuck , I don't think so, the pytest test passes in the current CI of the main branch, and in this PR pytest reports an error, indicating that the changes affected the pytest results. The error message also points out the problem: RuntimeError: MPS backend out of memory due to very fragile MPS memory management.

SunsetWolf avatar Dec 01 '25 06:12 SunsetWolf

Hi, @donaldkuck , I don't think so, the pytest test passes in the current CI of the main branch, and in this PR pytest reports an error, indicating that the changes affected the pytest results. The error message also points out the problem: RuntimeError: MPS backend out of memory due to very fragile MPS memory management.

I think that the mac machine in CI may have only 8GB memory totally, and the memory is not enough for mps when running pytest?

donaldkuck avatar Dec 01 '25 06:12 donaldkuck