cpuinfo icon indicating copy to clipboard operation
cpuinfo copied to clipboard

failed to get cpuinfo on aws lambda arm64

Open kartheekgottipati opened this issue 2 years ago • 8 comments

AWS Lambda Arm64 pytorch 2.0.0

when running pytorch on aws lambda with pytorch 2.0.0 on arm64 i am getting the following error

[WARNING] 2023-04-10T23:55:34.026Z RUNNING WITH 1 threads Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present Error in cpuinfo: failed to parse both lists of possible and present processors terminate called after throwing an instance of 'c10::Error' what(): [enforce fail at ThreadPool.cc:44] cpuinfo_initialize(). cpuinfo initialization failed frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x50 (0xffff70e7ca90 in /var/task/torch/lib/libc10.so) frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, char const*, void const*) + 0x50 (0xffff70e7cc30 in /var/task/torch/lib/libc10.so) frame #2: + 0x2c8cc78 (0xffff73b6ac78 in /var/task/torch/lib/libtorch_cpu.so) frame #3: + 0x2c8fb64 (0xffff73b6db64 in /var/task/torch/lib/libtorch_cpu.so) frame #4: at::set_num_threads(int) + 0x2c (0xffff71bc12bc in /var/task/torch/lib/libtorch_cpu.so) frame #5: + 0x58d698 (0xffff7980f698 in /var/task/torch/lib/libtorch_python.so) frame #63: __libc_start_main + 0xe8 (0xffff84323e18 in /lib/aarch64-linux-gnu/libc.so.6) START RequestId: 6b21fcf4-19b2-45cc-83e4-74a2cefe6bad Version: $LATEST RequestId: 6b21fcf4-19b2-45cc-83e4-74a2cefe6bad Error: Runtime exited with error: signal: aborted Runtime.ExitError

both x86_64 and arm64 dont have access to the files on aws lambda but x86_64 is ignoring the issue and proceeding while using arm64 it failing with above error.

Any reason an error log is used for arm64 vs warning for the rest?

kartheekgottipati avatar Apr 11 '23 00:04 kartheekgottipati

I'm having the same issue

subhankar-trisetra avatar Aug 22 '23 15:08 subhankar-trisetra

I am having the same issue, torch version 2.1.0

jc-hdez avatar Oct 23 '23 20:10 jc-hdez

any update?

thecasual avatar Jan 07 '24 04:01 thecasual

I believe the issue is with onnxruntime itself and is still not resolved. I'm going to try x86 for now.

stephenswetonic avatar Apr 06 '24 20:04 stephenswetonic

In some sense of the word it’s an expected behavior: lambda runtime doesn’t want to leak hardware details to hosted processes, so cpuinfo fails to initialize, but PyTorch crash should be fixed

malfet avatar Apr 06 '24 22:04 malfet

Is any SLA for solving this bug? Issue was opened more than year ago. @soumith, @apaszke, @suo, could you please help?

StanislavMakhrov avatar Apr 23 '24 00:04 StanislavMakhrov

Also problematic in restricted build environments (like Nix) that don't expose /sys/devices/system/cpu/{possible,present} to prevent packages from relying on the specific hardware configuration of the build system.

pluiedev avatar Jun 14 '24 10:06 pluiedev

Any update?

nywhere avatar Sep 02 '24 20:09 nywhere

I am also having this issue.

Claire-E-prog avatar Jan 27 '25 19:01 Claire-E-prog

Same problem as well. Setting an environment variable of CPU_COUNT didn't work either.

ahmsay avatar Mar 17 '25 19:03 ahmsay

Any updates on this? Facing the same problem. Runs fine on x86_86 but throws "Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present" error on ARM64 architecture.

ShiwenXu avatar Mar 19 '25 04:03 ShiwenXu

Fixed here - https://github.com/pytorch/pytorch/commit/310e3060b7e4d0c76149aadad4519c7abed8c2a7?

digantdesai avatar Mar 21 '25 07:03 digantdesai