Fast_Sentence_Embeddings
Fast_Sentence_Embeddings copied to clipboard
Add Features to Sentencevectors
[ ] Sentencevectors: Global: [ ] Remove normalized vector files and replace with NN ANN: --> (Annoy, with Option for Google ScANN?) [ ] Only construct index when when calling most_similar method [ ] Logging of index speed [ ] Save and load of index [ ] Assert that index and vectors are of equal size [ ] Paramters must be tunable afterwards [ ] Method to reconstruct index [ ] How does the index saving comply with SaveLoad? [ ] Write unittests? Brute: [ ] Keep access to default method [ ] Make ANN Search the default?! --> Results? [ ] Throw warning for large datasets for vector norm init [ ] Maybe throw warning if exceeds RAM size of the embedding + normalization Other: [ ] L2 Distance [ ] L1 Distance [ ] Correlation (Power Score Correlation?) [ ] Lookup-Functionality (via defaultdict) [ ] Get vector: Not really memory friendly [ ] Show which words are in vocabulary [ ] Asses empty vectors (via EPS sum) [ ] Z-Score Transformation from Power-Means Embedding? --> Benefit?