He Jia

Results 87 comments of He Jia

> > @alykhantejani Most of TFRA users are using GPU sync training without PS. So it's few people to aware this issue. If this issue occurs only some of the...

> > > It seems that `user_embedding = de.keras.layers.SquashedEmbedding( user_embedding_size, initializer=embedding_initializer, devices=self.devices, name='user_embedding')` has some bugs。The embedding could not cannot identify which port the variables are on. > > >...

> > @alykhantejani Don’t worry the memory, DE alltoall embedding layer will shard entire embedding into different worker rank. And also you can use cpu embedding table, but DE HKV...

@alykhantejani Here is the demo: https://github.com/tensorflow/recommenders-addons/tree/master/demo/dynamic_embedding/movielens-1m-keras-with-horovod If you want to place the embedding in host memory, please set parameter devices=["CPU"] when you create embedding layer. If you want to use...

@beijinggao Do you have any questions? If not, I will close the issue.

_resource_handle是TFRA table的handle,saveable.op对象应该是TFRA的对象而非const string。所以需要检查代码中返回的saveables对象是不是合法的,只有合法的TFRA saveable才能够运行之后的代码。

This should be because freq_var is not caused by track. For the time being, you can use TF manual loaded apis to use. @huangenyan Fixed in PR #415

@mnicely I have tested cudnn attention in A30 with image nvcr.io/nvidia/pytorch:24.04-py3. it is much slower than flash attention in the same image. =====================TEST CUDNN Attention===================== /workspace/qkv_attention.py:34: UserWarning: USING CUDNN SDPA...

可以直接改,一般不会有太多问题,不过推理的话vllm比fast transformer之类的吞吐高很多,推荐换个框架