HeKa comments

Results 17 comments of


HeKa

ValueError: DistributedVariable.handle is not available outside the replica context or a `tf.distribute.Strategy.update()` call.

I also have this problem when develop user custom variable.

Support `MultiWorkerMirroredStrategy` distributed training strategy for dynamic embeddings

Hi @sivukhin, because of resource lock of TF, the MirroredStrategy for TFRA multi-table is not efficient. We recommend using Horovod for distributed training. https://github.com/tensorflow/recommenders-addons/blob/master/docs/api_docs/tfra/dynamic_embedding/keras/layers/HvdAllToAllEmbedding.md https://github.com/tensorflow/recommenders-addons/blob/6f7bbb86a03bf17ee7a8c4b8d36415a2ca1cf693/tensorflow_recommenders_addons/dynamic_embedding/python/keras/layers/embedding.py#L528 https://github.com/tensorflow/recommenders-addons/blob/master/demo/dynamic_embedding/movielens-1m-keras-with-horovod/movielens-1m-keras-with-horovod.py Or you could have...

Support `MultiWorkerMirroredStrategy` distributed training strategy for dynamic embeddings

Of course HvdAllToAllEmbedding supports training on CPU. I ran your code successfully with `CUDA_VISIBLE_DEVICES=-1 horovodrun -np 2 python hvd_two_tower_test.py`, which using both redis_creator and cuckoo_creator. Also, if the error that...

Support `MultiWorkerMirroredStrategy` distributed training strategy for dynamic embeddings

### Ring-AllReduce vs Parameter Server The lower communication time overhead of multi-worker strategy is based on synchronous training. If many CPU nodes are trained asynchronously with a small batch size,...

Support `MultiWorkerMirroredStrategy` distributed training strategy for dynamic embeddings

@sivukhin For now, it will continue to integrate and be compatible with the latest version of Tensorflow, but this is a lot of work. So it would be great if...

[Feat]Copy-free save and load for cuckoo hashtable

> Try this: https://github.com/tensorflow/recommenders-addons/blob/master/docs/api_docs/tfra/dynamic_embedding/FileSystemSaver.md

inference mode issue

```python model = build_model(xxx) de.enable_inference_mode() model.save(export_dir) ``` enable_inference_mode is used to change the graph building logic inner TFRA. It would be reduce two times memory copy in TrainableWrapper which are...

What's the difference of flash attention implement between cudnn and Dao-AILab?

@gautam20197 As far as I know, flash attention has been implemented by nvidia in tensorflow, right? [cuda_dnn.cc](https://github.com/tensorflow/tensorflow/blob/da22a881a3d24fd4f357207034ba6c596aa414d0/tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc)

What's the difference of flash attention implement between cudnn and Dao-AILab?

@Cjkkkk So if I understand correctly, in addition to TF/Jax, Pytorch can also use OpenXla to work with cudnn.

What's the difference of flash attention implement between cudnn and Dao-AILab?

Is there any benchmark between CuDNN fused attention and flash attention? Recently I found TorchACC has already supported using CuDNN fused attention in PyTorch training. So there's definitely a benchmark,...