amazon-dsstne
amazon-dsstne copied to clipboard
Output layer question
The output layer in these networks is often a bottleneck, because you have to do a (batch_size, hidden_dim) by (hidden_dim, num_classes) dense matrix multiplication. It doesn't seem like you'd get a speed up just by avoiding storing/multiplying by zeros -- are you doing any kinds of tricks here to reduce the cost of that operation?
Thanks ~ Ben
We have some ideas here based on approximate kNN methods. Stay tuned.
Interesting -- are you thinking just for inference or for both inference and training?
It's a no-brainer for inference, it's a science project for training.
On Tue, Jun 12, 2018, 12:43 PM Ben Johnson [email protected] wrote:
Interesting -- are you thinking just for inference or for both inference and training?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/amzn/amazon-dsstne/issues/178#issuecomment-396709649, or mute the thread https://github.com/notifications/unsubscribe-auth/ARNK9qAznaN0s3AZHISltNTnAnIXJFaVks5t8BnQgaJpZM4UfT6H .
Relevant code and paper, if you haven't seen it: https://github.com/rdspring1/LSH_DeepLearning I don't think they did it on GPUs or big models, but maybe interesting
Yep, I came up with roughly the same idea last Summer and then I read that paper that fleshed it out even further last Fall. My sparse input kernels end up ~20x faster than SGEMM so their seeing more or less the same speedup with LSH makes intuitive sense: one ends up memory-limited. Older GPUs would be a bear for this, but newer GPUs support arbitrary cuda streams and they have much larger L2 caches so it shouldn't be all that hard to write.
On Tue, Jun 12, 2018 at 12:47 PM, Ben Johnson [email protected] wrote:
Relevant code and paper, if you haven't seen it: https://github.com/rdspring1/LSH_DeepLearning I don't think they did it on GPUs or big models, but maybe interesting
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/amzn/amazon-dsstne/issues/178#issuecomment-396710808, or mute the thread https://github.com/notifications/unsubscribe-auth/ARNK9vofNis0ZyK1YVvZrZtVIfuu6FnVks5t8BrIgaJpZM4UfT6H .
Any updates on this? I'm trying to find an example of a library that uses approximate kNN methods to speed up the output layer.