HugeCTR icon indicating copy to clipboard operation
HugeCTR copied to clipboard

[BUG]build failed on gtest!

Open SeekPoint opened this issue 1 year ago • 3 comments

Describe the bug A clear and concise description of what the bug is.

root@83e8355fd506:/share/yk_repo/HugeCTR/HugeCTR# git branch

  • (HEAD detached at v23.08.00) root@83e8355fd506:/share/yk_repo/HugeCTR/HugeCTR# cmake -DCMAKE_BUILD_TYPE=Release -DSM="80;90" -DENABLE_MULTINODES=ON

====ok

root@83e8355fd506:/share/yk_repo/HugeCTR/HugeCTR# make -j

。。。。。

[ 61%] Built target rdkafka++ [ 61%] Linking CXX static library ../../../lib/libgtest.a [ 61%] Built target gtest [ 61%] Building CXX object third_party/googletest/googletest/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o [ 61%] Building CXX object third_party/googletest/googlemock/CMakeFiles/gmock.dir/src/gmock-all.cc.o In file included from /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/include/gmock/gmock.h:59, from /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/src/gmock-all.cc:39: /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/include/gmock/gmock-actions.h:342:5: error: ISO C++ forbids declaration of 'GTEST_DISALLOW_COPY_AND_ASSIGN_' with no type [-fpermissive] 342 | GTEST_DISALLOW_COPY_AND_ASSIGN_(FixedValueProducer); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/include/gmock/gmock-actions.h:353:5: error: ISO C++ forbids declaration of 'GTEST_DISALLOW_COPY_AND_ASSIGN_' with no type [-fpermissive] 353 | GTEST_DISALLOW_COPY_AND_ASSIGN_(FactoryValueProducer); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/include/gmock/gmock-actions.h:427:3: error: ISO C++ forbids declaration of 'GTEST_DISALLOW_COPY_AND_ASSIGN_' with no type [-fpermissive] 427 | GTEST_DISALLOW_COPY_AND_ASSIGN_(ActionInterface); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/include/gmock/gmock-actions.h:686:27: error: expected identifier before '!' token

。。。。。

To Reproduce Steps to reproduce the behavior:

sudo docker build --build-arg BASE_IMAGE=merlinbase -f dockerfile.ctr .

sudo docker run -it --entrypoint=/bin/bash -v /home/amd00:/share -v /data:/data --name hugectr_dev_c --shm-size="50G" hugectr_dev

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [e.g. Ubuntu xx.yy]
  • Graphic card: [e.g. a single NVIDIA V100 or NVIDIA DGX A100]
  • CUDA version: [e.g. CUDA 11.x]
  • Docker image

Additional context Add any other context about the problem here.

SeekPoint avatar Feb 12 '24 07:02 SeekPoint

Thank you for your feedback! @minseokl @shijieliu to check when they come back from Luna New Year holiday vacation.

zehuanw avatar Feb 12 '24 23:02 zehuanw

Hi @SeekPoint, thanks for the finding. Could you give more information: How did you build image 'merlinbase'? Which merlin branch was used for 'dockerfile.merlin' and 'dockerfile.ctr'? I can't reproduce it when using merlin-base:23.08 and 'v23.08.00' of HugeCTR.

EmmaQiaoCh avatar Feb 19 '24 04:02 EmmaQiaoCh

amd00@MZ32-00:~/yk_repo/HugeCTR/Merlin/docker$ git branch

  • (HEAD detached at v23.08.00) main

since I got network issue in China, I have do some change with: diff --git a/docker/dockerfile.merlin b/docker/dockerfile.merlin index 8f9aa3df..59b3abb6 100644 --- a/docker/dockerfile.merlin +++ b/docker/dockerfile.merlin @@ -102,10 +102,10 @@ RUN pip install --no-cache-dir --upgrade pip; pip install --no-cache-dir "cmake< xgboost==1.6.2 lightgbm
lightfm implicit
numba "cuda-python>=11.5,<12.0" fsspec==2022.5.0 llvmlite \

  •            pynvml==11.4.1
    

-RUN pip install --no-cache-dir treelite==2.4.0 treelite_runtime==2.4.0 -RUN pip install --no-cache-dir numpy==1.22.4 protobuf==3.20.3 onnx onnxruntime pycuda -RUN pip install --no-cache-dir dask==${DASK_VER} distributed==${DASK_VER} dask[dataframe]==${DASK_VER}

  •            pynvml==11.4.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
    

+RUN pip install --no-cache-dir treelite==2.4.0 treelite_runtime==2.4.0 -i https://pypi.tuna.tsinghua.edu.cn/simple +RUN pip install --no-cache-dir numpy==1.22.4 protobuf==3.20.3 onnx onnxruntime pycuda -i https://pypi.tuna.tsinghua.edu.cn/simple +RUN pip install --no-cache-dir dask==${DASK_VER} distributed==${DASK_VER} dask[dataframe]==${DASK_VER} -i https://pypi.tuna.tsinghua.edu.cn/simple RUN pip install --no-cache-dir onnx_graphsurgeon --index-url https://pypi.ngc.nvidia.com

Triton Server

@@ -299,7 +299,7 @@ COPY --chown=1000:1000 --from=dlfw /usr/local/lib/python${PYTHON_VERSION}/dist-p COPY --chown=1000:1000 --from=dlfw /usr/local/lib/python${PYTHON_VERSION}/dist-packages/numba-.dist-info /usr/local/lib/python${PYTHON_VERSION}/dist-packages/numba.dist-info/ COPY --chown=1000:1000 --from=dlfw /usr/local/lib/python${PYTHON_VERSION}/dist-packages/cubinlinker-.dist-info /usr/local/lib/python${PYTHON_VERSION}/dist-packages/cubinlinker.dist-info/

-RUN pip install --no-cache-dir jupyterlab notebook pydot testbook numpy==1.22.4 +RUN pip install --no-cache-dir jupyterlab notebook pydot testbook numpy==1.22.4 -i https://pypi.tuna.tsinghua.edu.cn/simple

ENV JUPYTER_CONFIG_DIR=/tmp/.jupyter ENV JUPYTER_DATA_DIR=/tmp/.jupyter amd00@MZ32-00:~/yk_repo/HugeCTR/Merlin/docker$

it means I add -i https://pypi.tuna.tsinghua.edu.cn/simple for the 'pip install'

amd00@MZ32-00:~/yk_repo/HugeCTR/Merlin/docker$ sudo docker build --pull -t merlinbase -f dockerfile.merlin .

then: sudo docker build --build-arg BASE_IMAGE=merlinbase -f dockerfile.ctr .

SeekPoint avatar Feb 22 '24 14:02 SeekPoint

Hi @SeekPoint ,Sorry, I still can't reproduce it although I checkout merlin v23.08.00 to build as the commands which you provided.
Could you check/provide these info:

  1. Did these lines(https://github.com/NVIDIA-Merlin/Merlin/blob/release-23.08/docker/dockerfile.ctr#L57-L58) executed in when executing docker build dockerfile.ctr?
  2. Could you try to pass these args when docker build: --build-arg 'HUGECTR_VER=v23.08.00' --build-arg 'HUGECTR_BACKEND_VER=v23.08.00'
  3. Could you attach the dockerfile.merlin, dockerfile.ctr and the build output for command 'sudo docker build --build-arg BASE_IMAGE=merlinbase -f dockerfile.ctr .' Thanks a lot!

EmmaQiaoCh avatar Feb 28 '24 10:02 EmmaQiaoCh

@EmmaQiaoCh

I try again and passed on gtest, but failed on another error:

[ 28%] Building CXX object third_party/rocksdb/CMakeFiles/rocksdb-shared.dir/env/fs_remap.cc.o
/usr/include/rmm/logger.hpp(116): error: namespace "fmt" has no member class "ostream_formatter"
  struct fmt::formatter<rmm::detail::bytes> : fmt::ostream_formatter {

I can fix by:

git clone https://github.com/fmtlib/fmt mkdir build cd fmt/ mkdir build cd build/ cmake .. make -j32 make install

git clone https://github.com/gabime/spdlog.git cd spdlog && mkdir build && cd build cmake .. make -j32 make install

thanks you;)

SeekPoint avatar Mar 05 '24 03:03 SeekPoint