HybridBackend issues

support keras fit history in estimator's train_and_evaluate

# User Story I want to hold a record of the loss values and metric values during training, like keras History object: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/History https://keras.io/guides/training_with_built_in_methods/ ![image](https://user-images.githubusercontent.com/9114743/185031012-95c20126-4da3-45b7-a1cb-41e5281be848.png) # Detailed requirements I have...

karterotte

enhancement

support ARROW_NUM_THREADS in ParquetDataset

# User Story ![image](https://user-images.githubusercontent.com/9114743/180749572-5fa90e21-9012-4883-ae0a-6ba6172514ea.png) hb.data.ParquetDataset can not used all of pod-cpu. # Detailed requirements hb.data.ParquetDataset 1. num_parallel_reads to set file reader nums 2. **[new]**num_arrow_threads to set column reader thread nums...

karterotte

hybridbackend 0.6.0a2 version raise ValueError when ParquetDataset wrapped by parallel_interleave ops

# Current behavior When hb.data.ParquetDataset wrapped by tf.data.experimental.parallel_interleave ops, here is a ValueError: Field xxx (dtype=unkown, shape=()) is incomplete, please specify dtype and ragged_rank # Expected behavior hb.data.ParquetDataset wrapped by...

fuhailin

Error when drop_reminder=True using rebatch API

# Current behavior Using rebatch API with drop_reminder=True will make program exit with segmentation fault # Expected behavior No error # System information - GPU model and memory: - OS...

liurcme

Error when running imported/restored model that uses feedable iterator

I got a situation where I trained a model and saved its checkpoint files, then I need to restore the graph from the meta file and feed a new data...

fuhailin

Filter_func in parqeut reader

# User Story It is a common process that map, filter and batch, in row-based storage format, like tfrecord. But in parquet format, transforming to row-based dataset performs very badly...

paopaoactioner

error: Variables not initialized: communicator/1/HbNcclCommHandleOp

# Current behavior ``` 2022-10-19 12:39:39.948019: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2022-10-19 12:39:39.948020: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 INFO:tensorflow:Parsing ../data//train.csv INFO:tensorflow:Parsing ../data//train.csv WARNING:tensorflow:The default value of...

shijieliu

Op type not registered 'HbGetNcclId' in binary

Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1348, in _run_fn self._extend_graph() File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1388, in _extend_graph tf_session.ExtendSession(self._session) tensorflow.python.framework.errors_impl.InvalidArgumentError: Op type...

ZhuYuJin

fix pyarrow error

convert both normal values and default value to same type to avoid error below: ``` pyarrow.lib.ArrowInvalid: ('Can only convert 1-dimensional array values', 'Conversion failed for column user_buy_category_list with type object')...

Nov11

No OpKernel was registered to support Op 'HbSparseSegmentMeanGrad1' used by node

# Current behavior I'm using docker image from "alideeprec/deeprec-release:deeprec2306-gpu-py38-cu116-ubuntu20.04-hybridbackend" and find error in my training process This is log: ``` INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Graph was finalized. 2023-08-23 15:35:03.977041: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94]...

karterotte

HybridBackend
HybridBackend copied to clipboard

Metadata

support keras fit history in estimator's train_and_evaluate

support ARROW_NUM_THREADS in ParquetDataset

hybridbackend 0.6.0a2 version raise ValueError when ParquetDataset wrapped by parallel_interleave ops

Error when drop_reminder=True using rebatch API

Error when running imported/restored model that uses feedable iterator

Filter_func in parqeut reader

error: Variables not initialized: communicator/1/HbNcclCommHandleOp

Op type not registered 'HbGetNcclId' in binary

fix pyarrow error

No OpKernel was registered to support Op 'HbSparseSegmentMeanGrad1' used by node

← Metadata

Owner

Metadata

HybridBackend HybridBackend copied to clipboard

Metadata

← Metadata

Owner

Metadata

HybridBackend
HybridBackend copied to clipboard