HybridBackend icon indicating copy to clipboard operation
HybridBackend copied to clipboard

A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster

Results 18 HybridBackend issues
Sort by recently updated
recently updated
newest added

# Current behavior when hb read some nested lists with ragged_rank > 1,the read Value cannot be transformed to SparseTensor by function hb.data.to_sparse. For example: dense_feature is one of the...

bug

# System information - OS Platform: Ubuntu 18.04.5 LTS - Docker version: 18.09.5 - GCC version: 7.5.0 - Python version: 3.6.9 - TensorFlow/PyTorch version: tf1.15.5 # Willing to contribute Yes

# Current behavior In distributed mode, deeprec works fine when training on one hour of data, but hangs when training on one day or more. Log: ![6ca9fe77321c27383b3b3de9bb8fc5d5](https://user-images.githubusercontent.com/35439432/229059537-ea1626df-2411-46bf-acb8-fb61fada092d.png) Nvidia-smi: ![a3ee237e24abfd35d1c087126b6331f8](https://user-images.githubusercontent.com/35439432/229059658-5b425fcf-f027-4f71-9d2c-908ffca14bf5.png) cpu:...

feature_column bucket_size is 6, use 8 gpus, then worker-5 and worker-6 'save/RestoreV2' failed; backtrace: Traceback (most recent call last): File "neg_feedback_multi.py", line 1252, in tf.app.run() File "/home/pai/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in...

# Current behavior If there is only one worker ,training with EarlyStopping callback is ok. When multi workers with EarlyStopping callback doing distribute training, all workers will be hanging and...

# Current behavior I am using hybridBackend to do data parallelism, I create a dataset and make it an iterator, when I use hybridBackend scope to wrap the whole pipeline,...

# User Story The fixed-length features in TFRecord support configuration with default values(https://www.tensorflow.org/api_docs/python/tf/io/FixedLenFeature), but currently, Parquet does not support this feature. If encountering a non-existent feature, an error will be...

enhancement