roadmap icon indicating copy to clipboard operation
roadmap copied to clipboard

allow disable cpu flags

Open chizkiyahu opened this issue 10 months ago • 10 comments

Tell us about your request

docker run --cpu-flags or docker run --cpu-disable-flags allow to disable cpu flags

Which service(s) is this request for?

docker run

Why Is Needed

Macbook pro max m1 - inside of docker

lscpu | grep Flags
Flags:                                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp flagm2 frint

Macbook pro max m1 - inside of docker

lscpu |  grep Flags
Flags:                                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 flagm2 frint svei8mm svebf16 bf16 afp sme

Compare table

Flag M4 M1
fp
asimd
evtstrm
aes
pmull
sha1
sha2
crc32
atomics
fphp
asimdhp
cpuid
asimdrdm
jscvt
fcma
lrcpc
dcpop
sha3
asimddp
sha512
asimdfhm
dit
uscat
ilrcpc
flagm
sb
paca
pacg
dcpodp
sve2
flagm2
frint
svei8mm
svebf16
bf16
afp
sme

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

  • tensorflow have illegal instruction errors in on the docker with macbook m4 because of the extra flags
  • will fix https://github.com/docker/for-mac/issues/7539

Are you currently working around the issue?

I am trying to solve this in the os level or the tensorflow level but without success for now

chizkiyahu avatar Jan 30 '25 18:01 chizkiyahu

update I solve my problem in tensorflow level by setting env XLA_FLAGS to value --xla_backend_optimization_level=0 but is probably be valuable to allow this in the docker level

chizkiyahu avatar Feb 02 '25 21:02 chizkiyahu

I hit this issue as well, where I was running an XLA model using TFServe in Docker. It got killed due to generating an SVE operation not supported by the processor, due to Docker wrongly claiming the sve2 capability.

kasper0406 avatar Feb 05 '25 10:02 kasper0406

I hit this issue as well, where I was running an XLA model using TFServe in Docker. It got killed due to generating an SVE operation not supported by the processor, due to Docker wrongly claiming the sve2 capability.

did XLA_FLAGS flag solved the problem ?

chizkiyahu avatar Feb 05 '25 13:02 chizkiyahu

Yes, it did fix it. There should be a proper fix for this though, which I believe should be that Docker should report the correct capabilities. Otherwise it complicates building docker images for multiple environments, as fx in production, disabling XLA optimizations seems like a bad idea.

kasper0406 avatar Feb 05 '25 14:02 kasper0406

I am not sure if this is a docker bug or tensorflow bug or even mac os bug if you thinks is a docker bug we can open issue in https://github.com/docker/for-mac

chizkiyahu avatar Feb 05 '25 15:02 chizkiyahu

I am not expert here, but I think it is a docker bug - it exposes a supported CPU feature set that is actually not supported.

When XLA then jits the model, it will think it’s fine to use the SVE instructions - but it is not!

FWIW I created a question in the XLA discord in case they have thoughts: https://discord.com/channels/999073994483433573/1004823487124357182/1336481646647709847

kasper0406 avatar Feb 05 '25 15:02 kasper0406

is look like the same problem https://github.com/rancher-sandbox/rancher-desktop/issues/8057#issuecomment-2586147638

chizkiyahu avatar Feb 05 '25 16:02 chizkiyahu

look like same problem https://github.com/elastic/elasticsearch/issues/118583

chizkiyahu avatar Feb 05 '25 16:02 chizkiyahu

look connected https://github.com/ClickHouse/ClickHouse/issues/74743

chizkiyahu avatar Feb 05 '25 16:02 chizkiyahu

https://github.com/moby/buildkit/issues/5433

chizkiyahu avatar Feb 05 '25 16:02 chizkiyahu

Docker desktop 4.39.0 [Mar 6, 2025] disables SME, SVE and SSVE so you may want to upgrade try again.

laverdet avatar Apr 14 '25 15:04 laverdet