Investigate alternatives for faster (joint) embeddings lookup

Open gabrielspmoreira opened this issue 3 years ago • 0 comments

@vysarge has done a number of benchmark and profiling experiments using using synthetic data comparing Merlin Models. In particular, she compared DLRM with the JoC DLRM TF implementation, whose experiments results can be found in this spreadsheet (Nvidia internal only).

She noticed in particular that MM implementation uses TF embedding API functions, while JoC uses a custom joint embedding that fuses embedding tables together and performs embeddings jointly with one call [code link], which is faster

Mar 24 '23 15:03 gabrielspmoreira