HybridBackend
HybridBackend copied to clipboard
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
# User Story I want to hold a record of the loss values and metric values during training, like keras History object: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/History https://keras.io/guides/training_with_built_in_methods/  # Detailed requirements I have...
# User Story  hb.data.ParquetDataset can not used all of pod-cpu. # Detailed requirements hb.data.ParquetDataset 1. num_parallel_reads to set file reader nums 2. **[new]**num_arrow_threads to set column reader thread nums...
# Current behavior When hb.data.ParquetDataset wrapped by tf.data.experimental.parallel_interleave ops, here is a ValueError: Field xxx (dtype=unkown, shape=()) is incomplete, please specify dtype and ragged_rank # Expected behavior hb.data.ParquetDataset wrapped by...
# Current behavior Using rebatch API with drop_reminder=True will make program exit with segmentation fault # Expected behavior No error # System information - GPU model and memory: - OS...
I got a situation where I trained a model and saved its checkpoint files, then I need to restore the graph from the meta file and feed a new data...
# User Story It is a common process that map, filter and batch, in row-based storage format, like tfrecord. But in parquet format, transforming to row-based dataset performs very badly...
# Current behavior ``` 2022-10-19 12:39:39.948019: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2022-10-19 12:39:39.948020: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 INFO:tensorflow:Parsing ../data//train.csv INFO:tensorflow:Parsing ../data//train.csv WARNING:tensorflow:The default value of...
Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1348, in _run_fn self._extend_graph() File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1388, in _extend_graph tf_session.ExtendSession(self._session) tensorflow.python.framework.errors_impl.InvalidArgumentError: Op type...
convert both normal values and default value to same type to avoid error below: ``` pyarrow.lib.ArrowInvalid: ('Can only convert 1-dimensional array values', 'Conversion failed for column user_buy_category_list with type object')...
# Current behavior I'm using docker image from "alideeprec/deeprec-release:deeprec2306-gpu-py38-cu116-ubuntu20.04-hybridbackend" and find error in my training process This is log: ``` INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Graph was finalized. 2023-08-23 15:35:03.977041: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94]...