Seth H. Young
Seth H. Young
directly above the 2nd line you quoted beginning with "summ, train_loss...," is the comment: `# instrument for tensorboard` anytime you see tensorFlow code referring to a summary or summaries, it...
yeah using cosine distance for the word embeddings is a really good idea and usually provides superior results to euclidean distance for this particular use case, if you add me...
haven't pushed the code yet, but cosine distance and an adjusted limit constant gave some very nice clusters with clear, well-defined themes for the word vectors. https://github.com/josephius/star-clustering/blob/feature/upper-threshold/basic_english_limit-0p618_cosine.txt
Commit https://github.com/josephius/star-clustering/commit/8a1d776de9fe9d7dddd8d145835b4954cf7c0017 contains changes adding a new angular distance metric class (https://en.wikipedia.org/wiki/Cosine_similarity#Angular_distance_and_similarity) in a new distances.py file that should allow for fairly hassle-free extension with custom distances should one be...
> I would suggest to use [scipy.spatial.distance.pdist](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html) instead of your `distance` module. This will give you access to a large collection of distances (euclidean, minkowski, cityblock, cosine, correlation, hamming, jaccard,...
as far as I know there has been no official paper published and i believe that currently the best reference for and discussion of the algorithm can be found at:...