HugeCTR
HugeCTR copied to clipboard
[BUG]build failed on gtest!
Describe the bug A clear and concise description of what the bug is.
root@83e8355fd506:/share/yk_repo/HugeCTR/HugeCTR# git branch
- (HEAD detached at v23.08.00) root@83e8355fd506:/share/yk_repo/HugeCTR/HugeCTR# cmake -DCMAKE_BUILD_TYPE=Release -DSM="80;90" -DENABLE_MULTINODES=ON
====ok
root@83e8355fd506:/share/yk_repo/HugeCTR/HugeCTR# make -j
。。。。。
[ 61%] Built target rdkafka++ [ 61%] Linking CXX static library ../../../lib/libgtest.a [ 61%] Built target gtest [ 61%] Building CXX object third_party/googletest/googletest/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o [ 61%] Building CXX object third_party/googletest/googlemock/CMakeFiles/gmock.dir/src/gmock-all.cc.o In file included from /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/include/gmock/gmock.h:59, from /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/src/gmock-all.cc:39: /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/include/gmock/gmock-actions.h:342:5: error: ISO C++ forbids declaration of 'GTEST_DISALLOW_COPY_AND_ASSIGN_' with no type [-fpermissive] 342 | GTEST_DISALLOW_COPY_AND_ASSIGN_(FixedValueProducer); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/include/gmock/gmock-actions.h:353:5: error: ISO C++ forbids declaration of 'GTEST_DISALLOW_COPY_AND_ASSIGN_' with no type [-fpermissive] 353 | GTEST_DISALLOW_COPY_AND_ASSIGN_(FactoryValueProducer); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/include/gmock/gmock-actions.h:427:3: error: ISO C++ forbids declaration of 'GTEST_DISALLOW_COPY_AND_ASSIGN_' with no type [-fpermissive] 427 | GTEST_DISALLOW_COPY_AND_ASSIGN_(ActionInterface); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /share/yk_repo/HugeCTR/HugeCTR/third_party/googletest/googlemock/include/gmock/gmock-actions.h:686:27: error: expected identifier before '!' token
。。。。。
To Reproduce Steps to reproduce the behavior:
sudo docker build --build-arg BASE_IMAGE=merlinbase -f dockerfile.ctr .
sudo docker run -it --entrypoint=/bin/bash -v /home/amd00:/share -v /data:/data --name hugectr_dev_c --shm-size="50G" hugectr_dev
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
- OS: [e.g. Ubuntu xx.yy]
- Graphic card: [e.g. a single NVIDIA V100 or NVIDIA DGX A100]
- CUDA version: [e.g. CUDA 11.x]
- Docker image
Additional context Add any other context about the problem here.
Thank you for your feedback! @minseokl @shijieliu to check when they come back from Luna New Year holiday vacation.
Hi @SeekPoint, thanks for the finding. Could you give more information: How did you build image 'merlinbase'? Which merlin branch was used for 'dockerfile.merlin' and 'dockerfile.ctr'? I can't reproduce it when using merlin-base:23.08 and 'v23.08.00' of HugeCTR.
amd00@MZ32-00:~/yk_repo/HugeCTR/Merlin/docker$ git branch
- (HEAD detached at v23.08.00) main
since I got network issue in China, I have do some change with:
diff --git a/docker/dockerfile.merlin b/docker/dockerfile.merlin
index 8f9aa3df..59b3abb6 100644
--- a/docker/dockerfile.merlin
+++ b/docker/dockerfile.merlin
@@ -102,10 +102,10 @@ RUN pip install --no-cache-dir --upgrade pip; pip install --no-cache-dir "cmake<
xgboost==1.6.2 lightgbm
lightfm implicit
numba "cuda-python>=11.5,<12.0" fsspec==2022.5.0 llvmlite \
-
pynvml==11.4.1
-RUN pip install --no-cache-dir treelite==2.4.0 treelite_runtime==2.4.0 -RUN pip install --no-cache-dir numpy==1.22.4 protobuf==3.20.3 onnx onnxruntime pycuda -RUN pip install --no-cache-dir dask==${DASK_VER} distributed==${DASK_VER} dask[dataframe]==${DASK_VER}
-
pynvml==11.4.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
+RUN pip install --no-cache-dir treelite==2.4.0 treelite_runtime==2.4.0 -i https://pypi.tuna.tsinghua.edu.cn/simple +RUN pip install --no-cache-dir numpy==1.22.4 protobuf==3.20.3 onnx onnxruntime pycuda -i https://pypi.tuna.tsinghua.edu.cn/simple +RUN pip install --no-cache-dir dask==${DASK_VER} distributed==${DASK_VER} dask[dataframe]==${DASK_VER} -i https://pypi.tuna.tsinghua.edu.cn/simple RUN pip install --no-cache-dir onnx_graphsurgeon --index-url https://pypi.ngc.nvidia.com
Triton Server
@@ -299,7 +299,7 @@ COPY --chown=1000:1000 --from=dlfw /usr/local/lib/python${PYTHON_VERSION}/dist-p COPY --chown=1000:1000 --from=dlfw /usr/local/lib/python${PYTHON_VERSION}/dist-packages/numba-.dist-info /usr/local/lib/python${PYTHON_VERSION}/dist-packages/numba.dist-info/ COPY --chown=1000:1000 --from=dlfw /usr/local/lib/python${PYTHON_VERSION}/dist-packages/cubinlinker-.dist-info /usr/local/lib/python${PYTHON_VERSION}/dist-packages/cubinlinker.dist-info/
-RUN pip install --no-cache-dir jupyterlab notebook pydot testbook numpy==1.22.4 +RUN pip install --no-cache-dir jupyterlab notebook pydot testbook numpy==1.22.4 -i https://pypi.tuna.tsinghua.edu.cn/simple
ENV JUPYTER_CONFIG_DIR=/tmp/.jupyter ENV JUPYTER_DATA_DIR=/tmp/.jupyter amd00@MZ32-00:~/yk_repo/HugeCTR/Merlin/docker$
it means I add -i https://pypi.tuna.tsinghua.edu.cn/simple for the 'pip install'
amd00@MZ32-00:~/yk_repo/HugeCTR/Merlin/docker$ sudo docker build --pull -t merlinbase -f dockerfile.merlin .
then: sudo docker build --build-arg BASE_IMAGE=merlinbase -f dockerfile.ctr .
Hi @SeekPoint ,Sorry, I still can't reproduce it although I checkout merlin v23.08.00 to build as the commands which you provided.
Could you check/provide these info:
- Did these lines(https://github.com/NVIDIA-Merlin/Merlin/blob/release-23.08/docker/dockerfile.ctr#L57-L58) executed in when executing docker build dockerfile.ctr?
- Could you try to pass these args when docker build: --build-arg 'HUGECTR_VER=v23.08.00' --build-arg 'HUGECTR_BACKEND_VER=v23.08.00'
- Could you attach the dockerfile.merlin, dockerfile.ctr and the build output for command 'sudo docker build --build-arg BASE_IMAGE=merlinbase -f dockerfile.ctr .' Thanks a lot!
@EmmaQiaoCh
I try again and passed on gtest, but failed on another error:
[ 28%] Building CXX object third_party/rocksdb/CMakeFiles/rocksdb-shared.dir/env/fs_remap.cc.o
/usr/include/rmm/logger.hpp(116): error: namespace "fmt" has no member class "ostream_formatter"
struct fmt::formatter<rmm::detail::bytes> : fmt::ostream_formatter {
I can fix by:
git clone https://github.com/fmtlib/fmt mkdir build cd fmt/ mkdir build cd build/ cmake .. make -j32 make install
git clone https://github.com/gabime/spdlog.git cd spdlog && mkdir build && cd build cmake .. make -j32 make install
thanks you;)