cesium-native icon indicating copy to clipboard operation
cesium-native copied to clipboard

Build issue with new url dependency

Open Grinslad opened this issue 9 months ago • 14 comments

Hi there Cesium folks.

I am trying to build the latest edition of cesium-native in a kind of odd environment. We run macs and build natively for arm silicon and also build for amd64 on our macs through docker.

This has worked up until recently, where the ada dependency was added.

However building a minimal docker image with cesium we get an error from ada.

97.25 c++: error: unrecognized command-line option ‘-mno-avx256-split-unaligned-load’
97.25 c++: error: unrecognized command-line option ‘-mno-avx256-split-unaligned-store’

which comes from the ada CMakeLists.txt:

# workaround for GNU GCC poor AVX load/store code generation
if ((CMAKE_CXX_COMPILER_ID STREQUAL "GNU") AND (CMAKE_SYSTEM_PROCESSOR MATCHES "^(i.86|x86(_64)?)$"))
  target_compile_options(ada PRIVATE -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store)
endif()

I think what happens is that ada thinks we are on amd64 and have avx instructions. We are in fact on ARM and running virtualization.

This error is reproducible with the following dockerfile (likely need to run on arm silicon for gcc to give the error?)

FROM python:3.12-slim

USER root

ARG DEBIAN_FRONTEND=noninteractive
RUN echo "deb http://deb.debian.org/debian trixie main" >> /etc/apt/sources.list
RUN groupadd -r -g 999 mrnobody && \
    useradd -r -g mrnobody -u 999 mrnobody && \
    apt-get update && \
    apt-get install -y -t trixie \
        bison build-essential cmake cmake-data curl flex git \
        jq perl pkg-config ninja-build tar unzip zip && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

WORKDIR /app

RUN bash -c "\
    git clone -b v0.45.0 https://github.com/CesiumGS/cesium-native.git /app/extern/cesium-native; \
    pushd /app/extern/cesium-native; \
    git submodule update --init --recursive; \
    popd; \
    "

ENV VCPKG_FORCE_SYSTEM_BINARIES=1

COPY ./build_cesium_native.sh /app/extern/build_cesium_native.sh
RUN bash -c "\
    /app/extern/build_cesium_native.sh && \
    rm -rf /app/extern/cesium-native \
    "

and a build_cesium_native.sh

#!/bin/bash
set -e
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

VS_ROOT=$(dirname $SCRIPT_DIR)
EXTERN_DIR=$VS_ROOT/extern
BUILD_DIR=$VS_ROOT/build
SRC_DIR=$EXTERN_DIR/cesium-native
PKG_DIR=$EXTERN_DIR/cesium-native-pkg

if [ -d "$PKG_DIR" ]; then
  rm -r $PKG_DIR
fi
mkdir -p $PKG_DIR

pushd $SRC_DIR
cmake -B build -S . -G "Ninja" \
  -DCMAKE_INSTALL_PREFIX=$PKG_DIR \
  -DCESIUM_TESTS_ENABLED=OFF \
  -DCMAKE_CXX_FLAGS=-fPIC \
  -DCMAKE_BUILD_TYPE="Release"
cmake --build build --target install --config Release

popd

Do you have any suggestions for how we should move forward? can we use boost::url here? Since this is all through VCPKG we don't really have control over ada or the version ada uses, we could hack around the issue and patch out the flags when we build cesium but i'd like some sort of configuration setting or something automagic. I dont even think checking the arch is suitable enough for ada to determine if avx is available or not... Do we just log a bug with ada?

Grinslad avatar Mar 17 '25 21:03 Grinslad

Hi @Grinslad, I'm not quite sure what the root of the issue is here. For what it's worth, we build cesium-native for both ARM64 and AMD64 macOS in the context of Cesium for Unreal, and don't have any trouble. However, we're not building for Linux or with GCC on a Mac, which is maybe what's different in your setup. If you think this is a bug in the Ada build process, I suggest reporting a bug there. It's possible to use a patched version of Ada in your own build by setting up an overlay port: https://learn.microsoft.com/en-us/vcpkg/consume/install-locally-modified-package

kring avatar Mar 19 '25 06:03 kring

Hi @kring

Thank you for your suggestion of creating an overlay port, we may go that route to get moving.

I did raise an issue with ada and the ada developer mentioned that is it on us to ensure CMAKE_SYSTEM_PROCESSORS is set up right. I think that this is in cesium's control.

For instance I can build ada successfully when I build it directly:

FROM python:3.12-slim

USER root

ARG DEBIAN_FRONTEND=noninteractive
RUN echo "deb http://deb.debian.org/debian trixie main" >> /etc/apt/sources.list
RUN groupadd -r -g 999 mrnobody && \
    useradd -r -g mrnobody -u 999 mrnobody && \
    apt-get update && \
    apt-get install -y -t trixie \
        bison build-essential cmake cmake-data curl flex git \
        jq perl pkg-config ninja-build tar unzip zip && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
    
# Set the working directory
WORKDIR /usr/src/app

# Clone the ada-url repository using SSH
RUN git clone https://github.com/ada-url/ada.git

# Change directory to the cloned repository
WORKDIR /usr/src/app/ada

# Build the project
RUN mkdir build && cd build && cmake .. && make

# Set the entrypoint to bash
ENTRYPOINT ["/bin/bash"]

However when building cesium with the provided Dockerfile and build script in the OP the error happens. So although I can create a port file I think this may be something that can just be handled here.

Since you mentioned that you use Macs, could you perhaps try out the Dockerfile & build script and see if it's reproducible? I really think this should just work™

Grinslad avatar Mar 19 '25 14:03 Grinslad

So hang on, you're trying to build inside Docker on an ARM64 Mac, right? Docker on Mac runs Linux in a VM, from what I understand. What processor architecture is the VM? Is it ARM64, or AMD64?

If it's AMD64, then I would argue that CMAKE_SYSTEM_PROCESSOR is set correctly to AMD64. And it's unclear to me why your GCC doesn't support the options Ada is trying to set. Perhaps it's an old version of GCC or something?

If the VM is ARM64, then ok, it's definitely incorrect that CMAKE_SYSTEM_PROCESSOR is set to something triggers CMAKE_SYSTEM_PROCESSOR MATCHES "^(i.86|x86(_64)?)$", but it's not clear to my why that would happen.

Can you link to your conversation with the Ada developers?

kring avatar Mar 20 '25 04:03 kring

Is this relevant? https://github.com/CesiumGS/cesium-native/blob/99060d502b848dcb690e481cdf526216a81526b0/cmake/detect-vcpkg-triplet.cmake#L41-L44

lilleyse avatar Mar 20 '25 13:03 lilleyse

Can you link to your conversation with the Ada developers?

Definitely @kring https://github.com/ada-url/ada/issues/920

anonrig avatar Mar 20 '25 14:03 anonrig

Hi @Kring

Yes you are correct we are building inside docker on an arm64 mac. I had the same idea as you that maybe it was the compiler version. So I actually built GCC 14.2 in the container prior to raising this issue to verify it was not something silly (maybe i messed up here? the gcc build takes a long time so i just assumed it all worked lol). I think you are correct that it runs in a VM for x86_64 images. There are aarch64 images too though, which I found out after making this report!

I think what happens is that for instance the python base image may actually be an aarch64 image, so its not running as a VM with virtualization but native docker. So you get 'LINUX' and 'aarch64' instead of 'APPLE' and 'aarch64', and somehow this resolves down to matching x86 system processors when building in ada (maybe because of 'LINUX' we end up configuring 'x86_64' when building ada). However gcc goes this arch isn't x86 so these flags are not supported. Maybe this is what is going on? (10000 foot view, im not super familiar with cesiums build system)

In order to build cesium on arm in docker we have to set ENV VCPKG_FORCE_SYSTEM_BINARIES=1 - but i think this may have downstream affects of somehow getting x86 compiler flags when building ada.

Apologies for any confusion. It's very technical and reliant on a lot of moving parts (like even what arch the container is) and i've kinda been figuring it out as we go myself.

edit: so to be clear I think what may be happening is (@lilleyse pointed out! I think you are right)

 elseif(LINUX) 
     # Assuming x64 here isn't necessarily correct, but it's the only platform we officially support. 
     set(DETECTED_VCPKG_TRIPLET "x64-linux") 
 else() 

Maybe could be improved. because it can be linux && aarch64? The 'officially' support means you dont technically support this case, and thats fine, but it did work prior 🤷‍♂ .

And again apologies for saying building for amd64 on mac. After investigating I think the base image may be an aarch64 linux image as we pull from python and would likely get the native one! I too would have thought it would be virtualized amd64.

Grinslad avatar Mar 20 '25 14:03 Grinslad

@Grinslad I think @lilleyse is exactly right. The problem is the line he referenced above. Cesium Native assumes that Linux == AMD64, which is not true in your case. You should be able to replace x64-linux on that line with arm64-linux.

kring avatar Mar 24 '25 03:03 kring

Hi @kring

Id like you to recognize that it's not only just my case. You are correct that I could just change this line locally, or through a patch as suggesting. But I would like to push back because I feel that building in a docker container on apple silicon should be supported. As said, cesium already supports native mac builds - why not builds in a container? I would wager that most development teams following continuous integration principles would utilize tools like docker to build this open source software.

For instance i am a developer at seequent who uses cesium-native to do our tiling, we all use macs and we have a CI process that builds in docker to ensure clean and reproducible builds. I could put what I would consider a bit of a hack to get around this issue but I strongly feel it should just work. Is this something we can fix here?

Grinslad avatar Apr 01 '25 17:04 Grinslad

@Grinslad I wasn't suggesting that as the final solution. The first step is to confirm that it solves the problem for you. Then we can work on incorporating that change into Cesium Native itself. I think in this case it should be simple: add some logic in the LINUX section similar to the other platforms to discriminate between the two processor architectures and select the correct triplet. If you want to make that change and open a pull request, that would be amazing! We'll review and merge it quickly.

Thanks for letting me know you're from Seequent, I didn't realize that. I would love to hear more (probably offline) about how you're using Cesium Native.

kring avatar Apr 02 '25 00:04 kring

Hi @kring

thank you for the clarification. Poor assumption on my part. Yes that solves the issue with building ada. I have however run into other issues with uninitialized variables in SharedAsset.h (the reference count specifically) on aarch64 GCC 14.2 (and older versions of GCC, had to build 14.2 to test)

We were a few months behind (these other changes came in 5 months ago) so could not continue the build

/src/GltfReader.cpp.o.d -o CesiumGltfReader/CMakeFiles/CesiumGltfReader.dir/src/GltfReader.cpp.o -c /app/extern/cesium-native/CesiumGltfReader/src/GltfReader.cpp
In file included from /usr/local/include/c++/14.2.0/bits/shared_ptr_atomic.h:33,
                 from /usr/local/include/c++/14.2.0/memory:81,
                 from /app/extern/cesium-native/CesiumJsonReader/include/CesiumJsonReader/JsonReaderOptions.h:8,
                 from /app/extern/cesium-native/CesiumJsonReader/include/CesiumJsonReader/ExtensionsJsonHandler.h:4,
                 from /app/extern/cesium-native/CesiumJsonReader/include/CesiumJsonReader/ExtensibleObjectJsonHandler.h:4,
                 from /app/extern/cesium-native/CesiumGltfReader/generated/src/AccessorSparseIndicesJsonHandler.h:6,
                 from /app/extern/cesium-native/CesiumGltfReader/generated/src/AccessorSparseJsonHandler.h:5,
                 from /app/extern/cesium-native/CesiumGltfReader/generated/src/AccessorJsonHandler.h:5,
                 from /app/extern/cesium-native/CesiumGltfReader/generated/src/ModelJsonHandler.h:5,
                 from /app/extern/cesium-native/CesiumGltfReader/src/GltfReader.cpp:1:
In member function ‘std::__atomic_base<_IntTp>::__int_type std::__atomic_base<_IntTp>::operator--() [with _ITp = int]’,
    inlined from ‘void CesiumUtility::SharedAsset<T>::releaseReference(bool) const [with T = CesiumGltf::Schema]’ at /app/extern/cesium-native/CesiumUtility/include/CesiumUtility/SharedAsset.h:131:32,
    inlined from ‘CesiumUtility::ResultPointer<TAssetType> CesiumAsync::SharedAssetDepot<TAssetType, TAssetKey>::AssetEntry::toResultUnderLock() const [with TAssetType = CesiumGltf::Schema; TAssetKey = CesiumGltfReader::NetworkSchemaAssetDescriptor]’ at /app/extern/cesium-native/CesiumAsync/include/CesiumAsync/SharedAssetDepot.h:550:29:
/usr/local/include/c++/14.2.0/bits/atomic_base.h:406:34: error: ‘unsigned int __atomic_sub_fetch_4(volatile void*, unsigned int, int)’ writing 4 bytes into a region of size 0 overflows the destination [-Werror=stringop-overflow=]
  406 |       { return __atomic_sub_fetch(&_M_i, 1, int(memory_order_seq_cst)); }
      |                ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In member function ‘CesiumUtility::ResultPointer<TAssetType> CesiumAsync::SharedAssetDepot<TAssetType, TAssetKey>::AssetEntry::toResultUnderLock() const [with TAssetType = CesiumGltf::Schema; TAssetKey = CesiumGltfReader::NetworkSchemaAssetDescriptor]’:
cc1plus: note: destination object is likely at address zero
cc1plus: all warnings being treated as errors
[138/212] Building CXX object CesiumGltfCont...esiumGltfContent.dir/src/GltfUtilities.cpp.o
ninja: build stopped: subcommand failed.
root@7b26f4dda5de:/app# gcc --version
gcc (GCC) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Grinslad avatar Apr 02 '25 21:04 Grinslad

@Grinslad that looks the same as this, which was closed without merging because Sean couldn't reproduce it anymore: #1124 Can you see any possible way that GCC is right that this code is writing 4 bytes into a location of size zero? If not, it's a GCC bug, and disabling the warning-as-error, as done in that PR, is probably the only "solution".

kring avatar Apr 03 '25 00:04 kring

@kring From the code no, i'd have to dump the assembly to see and i'm not quite in the mood to learn aarch64 assembly. We will run with that for now. Thanks for the assistance! I am content that it has been seen before so at least its not something uniquely jank about this sort of set up.

I may go and log a gcc bug when I have time. @lilleyse did you update gcc and it went away or how did it, well, stop?

Grinslad avatar Apr 03 '25 15:04 Grinslad

Got it all building with the suggestions:

#!/bin/bash
set -e
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

VS_ROOT=$(dirname $SCRIPT_DIR)
EXTERN_DIR=$VS_ROOT/extern
BUILD_DIR=$VS_ROOT/build
SRC_DIR=$EXTERN_DIR/cesium-native
PKG_DIR=$EXTERN_DIR/cesium-native-pkg

if [ -d "$PKG_DIR" ]; then
  rm -r $PKG_DIR
fi
mkdir -p $PKG_DIR

# Determine the architecture
ARCH=$(uname -m)

# If the architecture is aarch64, modify the triplet and suppress the warning
if [ "$ARCH" == "aarch64" ]; then
  sed -i 's/set(DETECTED_VCPKG_TRIPLET "x64-linux")/set(DETECTED_VCPKG_TRIPLET "arm64-linux")/' $SRC_DIR/cmake/detect-vcpkg-triplet.cmake
  CMAKE_CXX_FLAGS="-fPIC -Wno-error=stringop-overflow="
else
  CMAKE_CXX_FLAGS="-fPIC"
fi

pushd $SRC_DIR
cmake -B build -S . -G "Ninja" \
  -DCMAKE_INSTALL_PREFIX=$PKG_DIR \
  -DCESIUM_TESTS_ENABLED=OFF \
  -DCMAKE_CXX_FLAGS="$CMAKE_CXX_FLAGS" \
  -DCMAKE_BUILD_TYPE="Release"
cmake --build build --target install --config Release

popd

I would like to see that both of these changes make their way upstream here so that we can remove our workarounds. cheers yall.

Grinslad avatar Apr 03 '25 23:04 Grinslad

Glad to hear you were able to get it working @Grinslad!

The quickest and best way to get it into the upstream is to open a pull request with your changes.

kring avatar Apr 04 '25 05:04 kring

I think everything here is addressed in https://github.com/CesiumGS/cesium-native/pull/1124?

lilleyse avatar Aug 08 '25 18:08 lilleyse

I think everything here is addressed in https://github.com/CesiumGS/cesium-native/pull/1124?

Now that that PR is merged, should be ok to close this issue.

lilleyse avatar Aug 26 '25 15:08 lilleyse