Duyi-Wang
Duyi-Wang
After enable Adaptive embedding, it fails to evaluate model with modelzoo after completing training. **Code to reproduce the issue** With WDL in modelzoo, run `python train.py --steps 100 --adaptive_emb true`...
After enabling smartstaged feature in distributed training with modelzoo code, an error occurs. **Other info / logs** ``` File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350,...
An error occurred when Multi-Hash Variable enabled in modelzoo's DIEN. And the doc of Multi-Hash Variable should be updated. https://deeprec.readthedocs.io/zh/latest/Multi-Hash-Variable.html `num_of_partitions` param of `get_multihash_variable` is removed in the code which...
I want to enable Auto Micro Batch feature in WDL and follow the steps in [DeepRec Docs](https://deeprec.readthedocs.io/zh/latest/Auto-Micro-Batch.html), but I get an error. **Code to reproduce the issue** I use following...
Modelzoo distributed training failed with grpc++ or star-server protocol when grpc is ok. Local and cloud environment have different errors. ** Local environment warning & error** ``` 2022-11-16 05:15:22.566511: I...
Master and slaves should both run according to the following workflow: ```Python while True: model.set_input_cb() model.forward_cb() model.free_seqs() ```