Andres Suarez
Andres Suarez
To disambiguate(), we currently need a context that is composed of words which exist in the AdaGram model. When this context contains a word that's not in the model vocabulary,...
Added a function to calculate similarity between 2 sense vectors. cos_distance does almost this, but it needs the raw vectors as input. similarity(vm::VectorModel, dict::Dictionary, w1::AbstractString, s1::Integer, w2::AbstractString, s2::Integer) takes the...
Added clustering function that implements k-means algorithm on word embeddings and writes classification to file. Added example in README file. Algorithm taken from word2vec clustering option
Define cluster labels as integers, instead of floats. Solves https://github.com/jasonlaska/spherecluster/issues/27
Spherical KMeans returns integer labels, as expected. However, VonMissesFisherMixture returns labels as floats, which causes trouble when using them to index integer-only functions.
This library does not work with sklearn Pipelines as it is now. I converted the RandomBinaryProjections this way, for a project I am working on in [this repo](https://github.com/glicerico/SGNN/blob/a327426671a2ad978e794a23aae8aa0405d95ecb/SGNN/core.py#L34)
Feel free to check this repo for scraping (most of) the answers: https://github.com/glicerico/medquad-scraper
There are a few mismatches between the dataset headers and the docx header definition files "Metric name". When the docx headers are used as reference while data-processing, this can lead...
Fixes https://github.com/cisco-ie/telemetry/issues/6 Updated header definitions file, so that datasets headers match the names given in the word documents
The 262h-long dataset #8 listed in your README is missing in the repository. Also, telemetry-topology-maps.pdf is missing the 2nd and 3rd slides.