Stephan Tulkens

Results 28 comments of Stephan Tulkens

Ok, that's great! Thanks for the quick reply. Is this mentioned in the docs somewhere?

Hi, the dataset can be downloaded here: https://github.com/ruidan/Unsupervised-Aspect-Extraction We trained the embeddings on the SemEval 2014 and 2015 corpora, which you can download here: http://alt.qcri.org/semeval2014/task4/ and here: http://alt.qcri.org/semeval2015/task12/

Hi! This is just an example, there is no file called "my_data.conllu". In order to work with the pipeline, the code needs to have data in CoNLL-U format (see here:...

M1 user here. I got the same error, installing the rust compiler fixed this for me.

@alibrahimzada I installed it with homebrew

This is the wrong repository for this issue. The transformers tokenizer package are not the same as the tokenizers in this repository, although very similar.

For those interested: I created a python script that creates a sentencepiece model on a training corpus, after which it segments the corpus, and trains BPE embeddings. The end result...

Thanks for the response, I'll wait! If you want, you can ping me when this can be started.

@DivyanshVinayak23 Sure, I totally forgot to pick this up. (@matsui528 my apologies 🙏 )

To chime in here: As mentioned, I think it is important to realize that in the cosine similarity 0 means orthogonal, while -1 means opposite. In particular, for every normalized...