Andres Suarez issues

Results 28 issues of


                                            Andres Suarez

disambiguate doesn't handle words not in vocabulary

To disambiguate(), we currently need a context that is composed of words which exist in the AdaGram model. When this context contains a word that's not in the model vocabulary,...

Distance calc

Added a function to calculate similarity between 2 sense vectors. cos_distance does almost this, but it needs the raw vectors as input. similarity(vm::VectorModel, dict::Dictionary, w1::AbstractString, s1::Integer, w2::AbstractString, s2::Integer) takes the...

K-means clustering function

Added clustering function that implements k-means algorithm on word embeddings and writes classification to file. Added example in README file. Algorithm taken from word2vec clustering option

Define labels as integers for vonMisesFisher

Define cluster labels as integers, instead of floats. Solves https://github.com/jasonlaska/spherecluster/issues/27

Returned labels are floats in VonMissesFisherMixture (soft and hard)

Spherical KMeans returns integer labels, as expected. However, VonMissesFisherMixture returns labels as floats, which causes trouble when using them to index integer-only functions.

Add sklearn pipeline support

This library does not work with sklearn Pipelines as it is now. I converted the RandomBinaryProjections this way, for a project I am working on in [this repo](https://github.com/glicerico/SGNN/blob/a327426671a2ad978e794a23aae8aa0405d95ecb/SGNN/core.py#L34)

Andres Suarez

disambiguate doesn't handle words not in vocabulary

Distance calc

K-means clustering function

Define labels as integers for vonMisesFisher

Returned labels are floats in VonMissesFisherMixture (soft and hard)

Add sklearn pipeline support

Code for scraping answers

Data headers not matching docx headers

updated header definitions file to match database headers

Dataset 8 is missing