badge
badge copied to clipboard
Large embedding vector, large number of classes?
Hi authors, Thanks for pushing this fantastic tool. I have a question regarding to the approach itself: When my my last layer dimension is really large + I have large number of classes to cover, if we compute the gradient vector, its dimension(embDim * nLab) will be of orders of magnitude 10^4 or more. Do you think BADGE is an efficient solution in this case?
I am not the author of this paper but I have been studying the active learning approaches for a while. I think you can approach the problem with parallelisation on several GPUs. This will not be easy to implement but can be potentially solve the problem of excessively large fc layer. Actually, using a large fully connected layer might be even help in better generalisation as it will allow the sampler to get as many diverse points possible.