Benedikt Fuchs
Benedikt Fuchs
Hi @mishraaditya595 this looks like it is failing to install the tokenizers dependency, I suppose the contributors and maintainers of that repository will be able to help you with that...
About Question 1: A `Sentence` is more representing a textual unit which you want to classify. The length does not matter, as long as you are not over the subtoken...
Hi @stefan-it thanks, I am also excited to hear about it. It's important to note, that the authors of ACE achieved the best results by concatenating transformer models that were...
Hi @lukasgarbas thank you for testing it! I noticed one implementation detail where I deviated from the original: I made it unable to use the same configuration twice, while the...
about **4.**: there is [this](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331690/) and more generally [this](https://arxiv.org/pdf/1603.01360.pdf) about **1.**: this is done by the tagger itself, you don't need to add it.
hi @dobbersc as you introduce some kind of special tokens, have you tried adding them specifically to the vocabulary of the transformer embeddings? You could do this by adding something...
Hi @dobbersc interesting and surprising results. looking at the tokenization without special tokens: ``` '[', 'h', '-', 'lo', '##c', ']', 'and', '[', 't', '-', 'lo', '##c', ']', ``` we see...
Hi again, I did some testing and basically all my ideas lead to an decrease of scores, here are my runs, all with some adjustments in the tokens : with...
Hi @miwieg, TransformerEmbeddings provide a parameter `allow_long_sentences` if that parameter is set to True, the embeddings will take some overlap to compute the token embeddings. (E.g. "This is a very...
I don't know if that works, I would rather add it to the constructor: ```document_embeddings = TransformerDocumentEmbeddings(..., allow_long_sentences=True, cls_pooling="mean")```