generative-recommenders
generative-recommenders copied to clipboard
Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152, ICML...
Compared to the same structure(the qkv attention) I implemented with TensorFlow, triton runs 10 to 20 times slower. With the help of nsight system, I found that cudaMemcpySync takes off...
I have written a bwd triton code , but I found that after adding the bias bwd , the speed is 30 times slower. The following is the time_weight bwd...
Differential Revision: D58453633
If I understand correctly, autoregressive model has a loss, and also multi-task dense layers followed autoregressive model has a weighted loss. How to combine them? And in ranking model, how...
Hello, it's a great work! I have some problem about the unknown token(newly created items). Because of the long sequence, only user side category features can be merged into the...
Hi, great work! I'm trying to reproduce the results on public datasets. However, I only found the training codes, where the model was evaluated on the eval set (or you...
Are there any plans to integrate the embedding_modules or custom samplers back into TorchRec?
Hey, Congratulations for your perfect and creative work. when I read the implementation code here, I am very confused about [SampledSoftmaxLoss](https://github.com/facebookresearch/generative-recommenders/blob/54e5240567041b8f74c735b437404270a5b1cf49/generative_recommenders/modeling/sequential/autoregressive_losses.py#L499). I have some questions for this: 1. why do...
Differential Revision: D64049725