ODM icon indicating copy to clipboard operation
ODM copied to clipboard

Do we need to maintain support for old x86_64 CPUs (pre-2010)? i.e. Should we require AVX?

Open spwoodcock opened this issue 1 month ago β€’ 12 comments

The current state of things

  • We currently use bundled gcc and g++ wrappers.
  • The wrapper replaces the system gcc/g++ with a script that:
    • Strips out any -march=* flags coming from the build system.
    • Injects its own -march=<ARCH> where:
      • nehalem is forced for x86_64 (very old β€” 2008).
      • armv8-a is forced for aarch64.
    • Delegates compilation to gcc_real.
  • This makes ODM run on very old CPUs without AVX / AVX2.
  • The result is a portable, reproducible binary that runs on essentially any CPU.

The issues this causes

  • After upgrading to Ubuntu 24.04 and updating many libraries, the standard Dockerfile builds fine (no wrappers).
  • But portable.Dockerfile and gpu.Dockerfile both use the wrappers - these are the production images.
  • These now fail to build. Likely caused by breaking OpenCV, OpenMVS, or OpenSfM.
  • The wrapper overrides what CMake detects:
    • CMake sees AVX/AVX2 support on the build machine
    • But the wrapper forces Nehalem
    • Result: build errors, intrinsic mismatches, inline asm failures - making things harder to maintain

The question

  • Do we still need the latest ODM versions to run on pre-2010 CPUs?
  • Can we EOL 3.5.6 as the last "legacy CPU" version, and release 4.0.0 with a modern CPU baseline?
  • Notes:
    • Nearly all cloud servers, CI runners, and modern machines support at least AVX, often AVX2.
    • Many major projects have already dropped support for older CPUs or are discussing it (NumPy, TensorFlow, PyTorch, and others!).

Benefits

  • Improved performance
  • Lower maintenance burden
  • GPU builds are simpler

Downsides

  • Very old machines will no longer run the newest ODM builds
  • Pre-2011 CPUs without AVX would need to stay on ODM 3.x

Related issues: https://github.com/OpenDroneMap/ODM/issues/1944 https://github.com/OpenDroneMap/ODM/issues/1957 Continues from the saga of https://github.com/OpenDroneMap/ODM/pull/1958

spwoodcock avatar Nov 26 '25 11:11 spwoodcock

I would say yes, especially as we see users outside of NA/EU commonly using older machines, especially older server-grade machines.

My inclination is that any and all x86_64 CPUs should be supported for now.

For instance, my Xeon X3380 fails on current builds due to a missing instruction set (Core 2 Quad-era).

Major distros are still targeting baseline x86_64, not even v2 or v3, so I don't think there should be a mismatch between what OS a user can expect to run and our stack.

Saijin-Naib avatar Nov 26 '25 12:11 Saijin-Naib

I have mixed feelings on this. On one hand, legacy support does broaden our impact outside NA/EU. On the other hand, while I would love to support whatever the distros support, their profile for support, indeed their team for supporting, is quite different to ours. And making our upgrade process less brittle is an important step toward keeping the stack sustainable.

smathermather avatar Nov 26 '25 13:11 smathermather

Is it possible to have graceful fallback to less performant/optimized functions without different images?

Saijin-Naib avatar Nov 26 '25 13:11 Saijin-Naib

As we have a containerised v3.5.6 that works fine on older architecture, the key question for me is how long will this continue to work? My hunch is that it should in theory work for a long time. Especially consider we aren't developing a tool that needs to be concerned with security vulnerabilities etc.

Of course, any users of v3.5.6 wouldn't be getting any updates / support, but we already have a pretty stable a functional tool that can be used for their needs, for a good few years to come.

Over time, as a boats get lifted and CPUs are upgraded, people can gradually move over to v4.0.0.

Would this not be an acceptable compromise? We continue to develop and make improvements, but make the development processes easier for ourselves (especially considering the maintainer void currently - perhaps we could reconsider if we have capacity and a really compelling reason in the future).

spwoodcock avatar Nov 26 '25 14:11 spwoodcock

Is it possible to have graceful fallback to less performant/optimized functions without different images?

This could possibly be a solution! I think some other projects go this route. I personally don't have much knowledge of how to best approach it, but willing to learn and it could be achievable πŸ˜„

spwoodcock avatar Nov 26 '25 14:11 spwoodcock

We could just improve the docs and image tagging to cater for different users:

3.5.6, legacy -> has the broad CPU support baked in to the latest stable version. 4.0.0, latest --> has support for 2010+ CPUs only.

Many much bigger teams of developers for large well funded projects, e.g. the ones listed above, are already going this route.

spwoodcock avatar Nov 26 '25 14:11 spwoodcock

The portable build seems to work when trying 3.6.0 locally, but fails in the CI due to a GDAL issue. Not sure if it's caching related or something.

Perhaps the GPU build issues aren't due to the portable build config, but are actually GPU config specific.

More testing needed for the builds / CI to confirm if the issue is actually related, and if we should pursue this route.

spwoodcock avatar Nov 26 '25 16:11 spwoodcock

We could just improve the docs and image tagging to cater for different users:

I agree with this approach. There aren't substantive changes / differences yet between 3.5 and 3.6, so feature parity is broadly here at time of separation, and a good product continues to be available for those running older hardware, as long as folks know where to find it.

We'll have to make sure there aren't implications for the decision downstream for NodeODM / WebODM, perhaps with a parallel issue / discussion there.

smathermather avatar Nov 26 '25 16:11 smathermather

Leaving aside the whole discussion of planned obsolescence, which I disagree with, I think everyone misunderstands the purpose of the portable build. I left a note 6 years ago when I went to great lengths to implement such a system-wide portability system here https://github.com/OpenDroneMap/ODM/blob/master/docker/README

Without the -march=nehalem flag, a docker image will contain binaries that are optimized for the machine that built the image, and will not run on older machines.

So your cutoff hardware is not pre-2010, it's pre-whatever year your build machine is. If your machine is a 2020 machine, most pre-2020 machines will crash with Illegal Instruction codes.

Discussions about development should be done https://community.opendronemap.org/c/developers-chat/21 where the community can also chime-in.

pierotofy avatar Nov 28 '25 06:11 pierotofy

Thank you for clarifying @pierotofy - appreciate it a lot πŸ€—

Also very good point about the forums! If we consider taking this further, I should definitely make a thread.

So your cutoff hardware is not pre-2010, it's pre-whatever year your build machine is. If your machine is a 2020 machine, most pre-2020 machines will crash with Illegal Instruction codes.

This is an interesting issue that I assume is unique to ODM, as I have not encountered such restrictions in other project image builds.

Typically, building on a newer machine doesn’t automatically make the resulting binaries incompatible with older CPUs. I’m assuming this must be coming from one (or many) of the underlying libraries.

I guess some dependencies are doing fairly aggressive CPU-feature detection during CMake configuration.

Would love to identify which libraries are the culprit, so we can look into options (such as if an upgrade might help this).

Thanks for your patience with me - I'm pretty new to the ODM ecosystem in general! πŸ™

spwoodcock avatar Nov 28 '25 09:11 spwoodcock

If support is reserved for AVX, this will also cause a crash in a virtual machine under Proxmox if the user forgot to check the correct CPU options when configuring the virtual machine... To date, without AVX, the workflow runs without error.

kikislater avatar Nov 28 '25 16:11 kikislater

If support is reserved for AVX, this will also cause a crash in a virtual machine under Proxmox if the user forgot to check the correct CPU options when configuring the virtual machine... To date, without AVX, the workflow runs without error.

Thanks for the input πŸ™

I agree this could cause confusion in Proxmox. For most default setups I wouldn't think its a problem. But if users explicitly select older architectures it definitely could be. The easiest way to probably avoid this is using host.

I think this issue ties strongly to #1903.

Two key things to consider:

  • If its easy to maintain the wrappers, then this is moot and we can continue to do so. But if they cause issues, its good to find community sentiment in advance so we can consider dropping it.

  • Related to all this is the time commitment. I'm assuming maintaining this repo isn't a trivial task. Nobody has stepped up with a full time commitment yet. Everyone involved has other jobs and very limited time for research, coding, testing. If removing the wrapper makes this easier with limited capacity, it may be the way to go for now.

spwoodcock avatar Dec 01 '25 10:12 spwoodcock