HugeCTR icon indicating copy to clipboard operation
HugeCTR copied to clipboard

[Requirement] Inference_test self-contained

Open albert17 opened this issue 3 years ago • 3 comments

Currently, running inference_test in merlin-inference container results in this error

[ RUN      ] embedding_cache.embedding_cache_usigned_int_0_0_5_1_enable
unknown file: Failure
C++ exception with description "Runtime error: file_stream.is_open() failed: /workdir/test/utest/simple_inference_config.json /repos/HugeCTR/HugeCTR/include/parser.hpp:40 
" thrown in the test body.
[  FAILED  ] embedding_cache.embedding_cache_usigned_int_0_0_5_1_enable (22 ms)

After creating the file in the desired location /workdir/test/utest/simple_inference_config.json, we still get the error Cannot open /hugectr/test/utest/dcn_csr.txt given that the test cannot create a file in that location.

After manually creating the file, we get the error

root@5ddd3850acab:/hugectr# inference_test
Running main() from /repos/HugeCTR/third_party/googletest/googletest/src/gtest_main.cc
[==========] Running 48 tests from 6 test suites.
[----------] Global test environment set-up.
[----------] 2 tests from session_inference_cpu
[ RUN      ] session_inference_cpu.criteo_dcn
row_ptrs_dim does not equal to num_samples*slot_num + 1
Segmentation fault (core dumped)

Apparently this is because some files are missing, that currently are located in internal systems.

It would be desirable that the Inference_test can run by itself without relying in internal files. Also, it would be the best if source and destination paths can be passed as parameters.

albert17 avatar Dec 01 '21 09:12 albert17

@albert17

  • simple_inference_config.json fils exist in hugectr repo https://github.com/NVIDIA-Merlin/HugeCTR/blob/master/test/utest/simple_inference_config.json. So you don't need to create this file. just make sure your /workdir/ is hugectr repo root folder

  • Make sure you have created these model files in the host CI machine and mount the folder to your container. https://github.com/NVIDIA-Merlin/HugeCTR/blob/master/.gitlab-ci.yml#L168

FYI @shijieliu

yingcanw avatar Dec 13 '21 12:12 yingcanw

@yingcanw

This forces us to use the path /workdir/test/utest/, while ideally hugectr is located on root. It is not a big deal and I can fix it underhood, but: would it be possible to just pass a path as parameter?

albert17 avatar Dec 15 '21 01:12 albert17

@yingcanw please confirm if it's fixed. If it's not we can add it to @EmmaQiaoCh 's list.

zehuanw avatar May 02 '22 02:05 zehuanw

Closed because this issue is old while we are working on deprecating the offline inference. For the alternatives, please checkout the Hierarchical Parameter Server based on the TensorRT and Tensorflow

minseokl avatar Aug 16 '23 08:08 minseokl