kanghui0204

Results 23 comments of kanghui0204

Hi ,zongshibuzai, q1:Are there other version requirements for installing spase_operation_kit, for example, cmake > 3.8 ,cuda? the cmake version must higher than 3.8 q2:error: ‘cudaEventWaitDefault’ was not declared in this...

Hi @zongshibuzai , because this issue is opened for a long time ,and we will close issue now . If you have another question with SOK install or SOK running...

Hi @iidsample ,now we have a multinode tutorial(https://github.com/NVIDIA-Merlin/HugeCTR/tree/master/tutorial/multinode-training) update, you can use script in tutorial to submit a multinode task with MPI. Please check if this update works for you.

Hi @iidsample , because this issue is opened for a long time ,and we will close issue now . If you have another question , you can reopen this issue...

Hi @longern SOK All2AllDenseEmbedding will be deprecated recently , please use SOK experiment API , here are some example: 1.lookup :https://github.com/NVIDIA-Merlin/HugeCTR/blob/main/sparse_operation_kit/sparse_operation_kit/experiment/test/function_test/tf2/lookup/lookup_sparse_distributed_dynamic_test.py 2.dump/load:https://github.com/NVIDIA-Merlin/HugeCTR/blob/main/sparse_operation_kit/sparse_operation_kit/experiment/test/function_test/tf2/dump_load/dump_load_distribute_static.py

Hi @kangna-qi ,thank you for using SOK. It seems to be a GPU memory out-of-bounds error. Could you provide me with the code of how you use SOK so that...

> @kanghui0204 Thanks for your reply.I've alreadly solved this problem.I can train the model with TF single threading. When using TF for multi-threaded model training, cuco requires locking to ensure...

Hi @minseokl , because @kangna-qi didn't response for 2 weeks , I decide close this issue, FYI.

Hi @MichoChan , you can use https://github.com/NVIDIA-Merlin/HugeCTR/blob/main/sparse_operation_kit/sparse_operation_kit/experiment/test/function_test/tf1/lookup/lookup_sparse_distributed_test.py to have a try

Hi @lausannel , here is an example of using SOK+HKV. [SOK+HKV example](https://github.com/NVIDIA-Merlin/HugeCTR/blob/main/sparse_operation_kit/sparse_operation_kit/examples/lookup_example_tf2/lookup_sparse_distributed_hkv_test.py) HKV is a key-value store that uses GPU + CPU memory, where the memory for values can be...