Stephan Tulkens comments

Results 28 comments of


                                            Stephan Tulkens

remove `sentence-transformers` dependency

I'll open a PR to take a stab at it, and will let you know!

Suggestion: Use gensim to load fastText word vectors

Hey @seden, Out of interest, what is the reduction in loading time you get when moving from .bin to .vec using Gensim? I was under the impression that loading from...

Methodological error in zero cost, zero time, zero shot notebook

Hey @MosheWasserb , Thanks for replying, really appreciated. Before I submit a PR, could we maybe discuss what you want the final conclusion of the article to look like? Because...

Extrinsic task code

Hi, Thanks for the quick response. I'll check out the code! The model namespace mentions that it was trained using `data/wiki_100.vec`. But is this correct? I'm assuming that this is...

Possibly biased KNeighborsClassifier for ties

Hi @4722794 The empirically most valid strategy would actually be to pick the label that is most frequent in your training data in case of a tie, instead of a...

Init Reach and call in spread functions are executing too slowly and consuming excessive resources. (nearest_neighbor_threshold)

Hey @Jester6136 , Yep, this is because of the high threshold. My apologies, this is rather inefficient. @Pringled contributed a fix, which is now in PR, see here #73 ....

Init Reach and call in spread functions are executing too slowly and consuming excessive resources. (nearest_neighbor_threshold)

I just merged #73. Could you try the new function? It just returns indices, so that should be a lot faster.

Stephan Tulkens

remove `sentence-transformers` dependency

Suggestion: Use gensim to load fastText word vectors

Methodological error in zero cost, zero time, zero shot notebook

Extrinsic task code

Possibly biased KNeighborsClassifier for ties

Init Reach and call in spread functions are executing too slowly and consuming excessive resources. (nearest_neighbor_threshold)

Init Reach and call in spread functions are executing too slowly and consuming excessive resources. (nearest_neighbor_threshold)

Cannot inject custom PreTokenizer into Tokenizer

Cannot inject custom PreTokenizer into Tokenizer

Adding many AddedTokens makes loading a tokenizer extremely slow.