HugeCTR [BUG] HugeCTR Model segfaults on Tritonserver inference request

Describe the bug When running a hugectr instance on tritonserver, the server is able to load the model, but when an inference request comes in, we get a segfault:

Signal (11) received.
 0# 0x000055F94442E1B9 in tritonserver
 1# 0x00007F923D3A60C0 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/hugectr/libtriton_hugectr.so
 3# 0x00007F923DC5773A in /opt/tritonserver/lib/libtritonserver.so
 4# 0x00007F923DC580F7 in /opt/tritonserver/lib/libtritonserver.so
 5# 0x00007F923DD15411 in /opt/tritonserver/lib/libtritonserver.so
 6# 0x00007F923DC515C7 in /opt/tritonserver/lib/libtritonserver.so
 7# 0x00007F923D795DE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 8# 0x00007F923E9A5609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
 9# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

To Reproduce Steps to reproduce the behavior: Please launch the tritonserver using model repository files provided in this ticket. Once the tritonserver instance is up, please launch an inference request against it using the following code:

    data = cudf.DataFrame({"a": [1], "b": [1], "c":[1], "label": [0]}, dtype="int64")
    data["label"] = data["label"].astype("float32")

    slot_size_array = [2,2,2]
    categorical_columns = ["a", "b", "c"]

    categorical_dim = len(categorical_columns)
    batch_size = data.shape[0]

    offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1].tolist()
    data[categorical_columns] += offset
    cat = data[categorical_columns].values.reshape(1, batch_size * categorical_dim).tolist()[0]

    row_ptrs = list(range(batch_size * categorical_dim + 1))
    dense = [0.0]

    request_df = cudf.DataFrame(
        {"CATCOLUMN": cat, "ROWINDEX": row_ptrs[1:], "DES": dense * len(cat)}
    )
    request_df["ROWINDEX"] = request_df["ROWINDEX"].astype("int32")
    request_df["CATCOLUMN"] = request_df["CATCOLUMN"].astype("int64")
    request_df["DES"] = request_df["DES"].astype("float32")

    inputs = triton.convert_df_to_triton_input(request_df.columns, request_df)
    outputs = [grpcclient.InferRequestedOutput(col) for col in ["OUTPUT0"]]
    response = None
    with grpcclient.InferenceServerClient("localhost:8001") as client:
        response = client.infer("hugectr", inputs, request_id="1", outputs=outputs)

    assert len(response.as_numpy("OUTPUT0")) == request_df.shape[0]

Expected behavior We expect that when an inference request is send, we get back inference results. Using the merlin CI runner container (based on the merlin-hugectr container).

Additional context zip file: test_triton.zip

Jun 27 '22 15:06 jperez999

Thank you for the issue report! @yingcanw can you check if @jperez999 ran with the right way or not? @EmmaQiaoCh for vis.

Jun 27 '22 23:06 zehuanw

@jperez999 There are 9 errors in your configuration files and test scripts as follows:

Configuration errors in ps.json file:

The value of "model": "model" should be "huegctr" in your ps.json.
The value of "maxnum_catfeature_query_per_table_per_sample": [1], should be at least [3] due to you have three slots.

Configuration errors in config.pbtxt:

If you add this "embeddingkey_long_type" into config.pbtxt, the value of this item cannot be empty.

parameters {
  key: "embeddingkey_long_type"
  value {
  }
}

If you add this "gpucache" into config.pbtxt, the value of this item cannot be empty.

parameters {
  key: "gpucache"
  value {
  }
}

The value of "config" in config.pbtxt should be "/tmp/test_triton/model_repository/hugectr/1/model.json"

parameters {
  key: "config"
  value {
    string_value: "/tmp/pytest-of-root/pytest-24/test_training0/model_repository/ps.json"
  }
}

For all the above configuration items, you can find detailed introduction in configuration book. BTW: In the last release we have simplified the configuration so you don't have to reconfigure these items in config.pbtxt. release notes.

Test script error:

Since the the dense input for your model is 0, the dense = [0.0] in the test script should be [] instead of [0.0]
The row_ptrs shape is 1*4, but in request_df it becomes 1*3 due to row_ptrs[1:]. So your inputs form convert_df_to_triton_input is also wrong. More details please refer to row_ptr sample
Since the inputs has only one sample, the output also has only one label. So len(response.as_numpy("OUTPUT0")) will never equal request_df.shape[0].
Since you may not get spare files(only one key) from training, the number of keys in the sparse file cannot be less than 64 by default.

Original ps.json:

{
	"supportlonglong": true,
	"models": [{
		"model": "model",
		"slot_num": 3,
		"sparse_files": ["/tmp/test_triton/model_repository/hugectr/1/0_sparse_0.model"],
		"dense_file": "/tmp/test_triton/model_repository/hugectr/1/_dense_0.model",
		"maxnum_des_feature_per_sample": 0,
		"network_file": "/tmp/test_triton/model_repository/hugectr/1/model.json",
		"num_of_worker_buffer_in_pool": 4,
		"num_of_refresher_buffer_in_pool": 1,
		"deployed_device_list": [0],
		"max_batch_size": 1024,
		"default_value_for_each_table": [0.0],
		"hit_rate_threshold": 0.9,
		"gpucacheper": 0.5,
		"gpucache": true,
		"cache_refresh_percentage_per_iteration": 0.2,
		"maxnum_catfeature_query_per_table_per_sample": [1],
		"embedding_vecsize_per_table": [16],
		"embedding_table_names": ["sparse_embedding1"]
	}]
}

Jun 28 '22 08:06 yingcanw

@yingcanw thanks for the reply this helped us move forward. However we are now running into a new issue, we are getting the following error:

I0629 14:04:52.229834 14141 hugectr.cc:1827] TRITONBACKEND_Backend Finalize: HugectrBackend
terminate called after throwing an instance of 'nv::CudaException'
  what():  /hugectr/gpu_cache/src/nv_gpu_cache.cu:1494: CUDA error 101: invalid device ordinal
Signal (6) received.
 0# 0x00005560BC9021B9 in /opt/tritonserver/bin/tritonserver
 1# 0x00007FDEEBECD0C0 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6
 3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
 4# 0x00007FDEEC284911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007FDEEC29038C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007FDEEC28F369 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7# __gxx_personality_v0 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 8# 0x00007FDEEC08CBEF in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
 9# _Unwind_Resume in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
10# 0x00007FDEE006A3D6 in /usr/local/hugectr/lib/libgpu_cache.so
11# HugeCTR::EmbeddingCache<long long>::~EmbeddingCache() in /usr/local/hugectr/lib/libhuge_ctr_hps.so
12# std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::map<long, std::shared_ptr<HugeCTR::EmbeddingCacheBase>, std::less<long>, std::allocator<std::pair<long const, std::shared_ptr<HugeCTR::EmbeddingCacheBase> > > > >, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::map<long, std::shared_ptr<HugeCTR::EmbeddingCacheBase>, std::less<long>, std::allocator<std::pair<long const, std::shared_ptr<HugeCTR::EmbeddingCacheBase> > > > > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::map<long, std::shared_ptr<HugeCTR::EmbeddingCacheBase>, std::less<long>, std::allocator<std::pair<long const, std::shared_ptr<HugeCTR::EmbeddingCacheBase> > > > > > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::map<long, std::shared_ptr<HugeCTR::EmbeddingCacheBase>, std::less<long>, std::allocator<std::pair<long const, std::shared_ptr<HugeCTR::EmbeddingCacheBase> > > > > >*) in /usr/local/hugectr/lib/libhuge_ctr_hps.so
13# std::_Sp_counted_ptr<HugeCTR::ManagerPool*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() in /usr/local/hugectr/lib/libhuge_ctr_hps.so
14# std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() in /usr/local/hugectr/lib/libhuge_ctr_inference.so
15# HugeCTR::HierParameterServer<long long>::~HierParameterServer() in /usr/local/hugectr/lib/libhuge_ctr_hps.so
16# 0x00007FDEE118B3F4 in /opt/tritonserver/backends/hugectr/libtriton_hugectr.so
17# TRITONBACKEND_Finalize in /opt/tritonserver/backends/hugectr/libtriton_hugectr.so
18# 0x00007FDEEC75A15E in /opt/tritonserver/lib/libtritonserver.so
19# 0x00007FDEEC75FAE6 in /opt/tritonserver/lib/libtritonserver.so
20# 0x00007FDEEC75EDEA in /opt/tritonserver/lib/libtritonserver.so
21# 0x00007FDEEC7630F8 in /opt/tritonserver/lib/libtritonserver.so
22# TRITONSERVER_ServerDelete in /opt/tritonserver/lib/libtritonserver.so
23# 0x00005560BC8F8092 in /opt/tritonserver/bin/tritonserver
24# 0x00005560BC8F8608 in /opt/tritonserver/bin/tritonserver
25# 0x00005560BC8EB9CC in /opt/tritonserver/bin/tritonserver
26# __libc_start_main in /usr/lib/x86_64-linux-gnu/libc.so.6
27# 0x00005560BC8EED2E in /opt/tritonserver/bin/tritonserver

Above is the stderr output of tritonserver. Below is the stdout of hugectr, we get a signal 2:

[HCTR][14:04:48.108][INFO][RK0][main]: Done
[HCTR][14:04:48.128][INFO][RK0][main]: Rank0: Write optimzer state to file
[HCTR][14:04:48.149][INFO][RK0][main]: Done
[HCTR][14:04:48.150][INFO][RK0][main]: Dumping sparse optimzer states to files, successful
[HCTR][14:04:48.150][INFO][RK0][main]: Dumping dense weights to file, successful
[HCTR][14:04:48.150][INFO][RK0][main]: Dumping dense optimizer states to file, successful
[HCTR][14:04:48.506][INFO][RK0][main]: default_emb_vec_value is not specified using default: 0
[HCTR][14:04:48.506][INFO][RK0][main]: Creating HashMap CPU database backend...
[HCTR][14:04:48.506][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][14:04:48.506][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][14:04:48.616][INFO][RK0][main]: Table: hps_et.0_hugectr.sparse_embedding1; cached 1 / 1 embeddings in volatile database (PreallocatedHashMapBackend); load: 1 / 18446744073709551615 (0.00%).
[HCTR][14:04:48.616][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][14:04:48.616][INFO][RK0][main]: Create embedding cache in device 0.
[HCTR][14:04:48.617][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][14:04:48.617][INFO][RK0][main]: Configured cache hit rate threshold: 0.900000
[HCTR][14:04:48.617][INFO][RK0][main]: The size of thread pool: 16
[HCTR][14:04:48.617][INFO][RK0][main]: The size of worker memory pool: 4
[HCTR][14:04:48.617][INFO][RK0][main]: The size of refresh memory pool: 1
Error: Invalid value for capacity_in_set.
[HCTR][14:04:48.635][INFO][RK0][main]: Global seed is 2569706097
[HCTR][14:04:49.494][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
[HCTR][14:04:49.494][INFO][RK0][main]: Start all2all warmup
[HCTR][14:04:49.495][INFO][RK0][main]: End all2all warmup
[HCTR][14:04:49.495][INFO][RK0][main]: Create inference session on device: 0
[HCTR][14:04:49.495][INFO][RK0][main]: Model name: 0_hugectr
[HCTR][14:04:49.495][INFO][RK0][main]: Use mixed precision: False
[HCTR][14:04:49.495][INFO][RK0][main]: Use cuda graph: True
[HCTR][14:04:49.495][INFO][RK0][main]: Max batchsize: 1024
[HCTR][14:04:49.495][INFO][RK0][main]: Use I64 input key: True
[HCTR][14:04:49.495][INFO][RK0][main]: start create embedding for inference
[HCTR][14:04:49.495][INFO][RK0][main]: sparse_input name data1
[HCTR][14:04:49.495][INFO][RK0][main]: create embedding for inference success
[HCTR][14:04:49.495][INFO][RK0][main]: Inference stage skip BinaryCrossEntropyLoss layer, replaced by Sigmoid layer
Signal (2) received.

The command used to run this tritonserver instance is: tritonserver --model-repository=/tmp/pytest-of-root/pytest-76/test_training0/model_repository/ --backend-config=hugecr,ps=/tmp/pytest-of-root/pytest-76/test_training0/model_repository/ps.json

model_repository.zip

Jun 29 '22 14:06 jperez999

@jperez999

HCTR][14:04:48.616][INFO][RK0][main]: Table: hps_et.0_hugectr.sparse_embedding1; cached 1 / 1 embeddings in volatile database (PreallocatedHashMapBackend); load: 1 / 18446744073709551615 (0.00%). [HCTR][14:04:48.616][DEBUG][RK0][main]: Real-time subscribers created! [HCTR][14:04:48.616][INFO][RK0][main]: Create embedding cache in device 0. [HCTR][14:04:48.617][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000 [HCTR][14:04:48.617][INFO][RK0][main]: Configured cache hit rate threshold: 0.900000 [HCTR][14:04:48.617][INFO][RK0][main]: The size of thread pool: 16 [HCTR][14:04:48.617][INFO][RK0][main]: The size of worker memory pool: 4 [HCTR][14:04:48.617][INFO][RK0][main]: The size of refresh memory pool: 1 Error: Invalid value for capacity_in_set. [HCTR][14:04:48.635][INFO][RK0][main]: Global seed is 2569706097

Because your embedding table has only one key, it fails to initialize embedding cache, as I mentioned in the previous comment, at least 64 keys are required.

Jun 30 '22 00:06 yingcanw

Hi @jperez999 do you still have problems on this?

Nov 23 '22 05:11 EmmaQiaoCh

Close the issue because there is no activity for a while. One can reopen it if the issue still exists, thanks.

Dec 28 '22 02:12 JacoCheung

When the cache size configured by the user is less than 64, the default value will be used, so this problem will not occur after 22.09

Dec 28 '22 02:12 yingcanw

HugeCTR HugeCTR copied to clipboard

[BUG] HugeCTR Model segfaults on Tritonserver inference request

HugeCTR
HugeCTR copied to clipboard