torchrec
torchrec copied to clipboard
embeddingbagcollection 16p vs 8p
We found that the throughput of embeddingbagcollection 16p is only 1.2 times that of 8p. Are there any optimization measures?