Joel Grus
Joel Grus
probably someone has done this, but if so I don't know about it. if you want to contribute the notebooks back to this repo, let me know. or if you...
some day I'll learn my lesson about relying on the oreilly website not to change and break things ☹
btw, it seems like pylance fixes this
I don't know much about Flair embedddings, but I took a quick look at their paper and it looks like they're just doing character-level embeddings and then taking the last...
wouldn't you just use the character tokenizer (which would keep spaces) and then compute the offsets in the token indexer?
are the rules for word boundaries that complicated that you couldn't just include them in the token indexer?
what does "originally tokenized" mean here? say I have a sentence "go." I feed that to the character tokenizer and get ["g", "o", "."] if the sentence were "go .",...
ok, I think I get it now. but the spacy tokenizer is already returning the offsets as `token.idx`: ``` In [11]: t = WordTokenizer() In [12]: tokens = t.tokenize("This isn't...
if your text is pre-tokenized you're out of luck in any case. I am extremely comfortable enforcing "if you want to use flair embeddings, you must use a tokenizer that...
in this case your DatasetReader must be (I assume) somehow creating `Token` objects to populate a `TextField`? in which case I'd say that yes it's the dataset reader's job to...