Jinhua Wang comments

Results 48 comments of


Jinhua Wang

get_latest_training_loss returns 0

This is related to #2658, which probably should not be closed. @gojomo It seems that currently fasttext would not return the correct loss using `get_latest_training_loss`.

get_latest_training_loss returns 0

> #2658 is closed as a duplicate, because #2617 is a more comprehensive discussion of what broken (or simply never implemented) in the *2Vec models. > > The docs are...

Gensim sort_by_descending_frequency changes most_similar results

> Also discussed at: https://stackoverflow.com/q/68451937/130288 > > Most importantly: the words are already sorted in descending frequency by default - so a workaround for any case where this is happening,...

Gensim sort_by_descending_frequency changes most_similar results

> > Is it possible that model loaded via load_facebook_model is not sorted by default? I did not see an option to sort vectors after loading Facebook models though ......

TypeError: 'NoneType' object is not iterable

downgrading to tensorflow version 1.15 seems to work for me

help sought to train a big data sentence model (upto 1.5 million sentences)

> Personally, I would simply put all those 1.5 million sentences (documents) in a list and then put that list as the docs argument. If have you have enough RAM...

help sought to train a big data sentence model (upto 1.5 million sentences)

@MaartenGr I see. So should I call `fit` on the 200,000 sentences, and than call `transform` on the 1.3 million sentences?

help sought to train a big data sentence model (upto 1.5 million sentences)

But if transform takes a lot of memory, can I transform on smaller chunks (such as several 200,000 chunks summing up to 1.3 million sentences), and then combine the results...

help sought to train a big data sentence model (upto 1.5 million sentences)

@MaartenGr I see. Is it possible to further reduce the memory usage by tuning the hyperparameter of UMAP (i.e. reducing the dimensionality of the document embeddings further) or HDBSCAN (fewer...

help sought to train a big data sentence model (upto 1.5 million sentences)

For example, if K-means consume less memory, can we use k-means instead of HDBSCAN?