Iz Beltagy

Results 38 comments of Iz Beltagy

We didn't release the full text of SciBERT, but you can try using the GORC corpus https://github.com/allenai/s2-gorc which we recently released. It is larger and cleaner than the SciBERT training...

You will need to use the ner_finetune.json config. This was recently merged with master https://github.com/allenai/scibert/blob/master/allennlp_config/ner_finetune.json

You would follow the same recipe except replacing `bert-base-cased` with one of the scibert models, `allenai/scibert_scivocab_uncased` for example. Side note, you might find better trainers in the HF examples https://github.com/huggingface/transformers/tree/master/examples/text-classification....

Just follow the instructions in the readme, and you should be able to reproduce the results of frozen embeddings. The finetuning experiments require the code in this PR as well...

can you try adding the argument `--use-dataset-reader` to your command line?

Looks like you need to add a dummy predictor for it to work, something like: ``` @Predictor.register('dummy_predictor') class DummyPredictor(Predictor): pass ``` then in the command line add `--predictor dummy_predictor`

1- The NER model predicts an IOB label per token in the sentence, which can be used at decoding time to find spans of entities 2- We use span-based f1...

The code is a bit difficult to read without formatting, but the obvious issues are that you need to use `AutoModelForTokenClassification` and it is weird to do `encode(tokenize(tokenizer.decode(tokenizer.encode(string)))`. I think...

maybe something like this will make it faster to clone https://stackoverflow.com/questions/600079/how-do-i-clone-a-subdirectory-only-of-a-git-repository/52269934#52269934

Interesting. I have seen the same pattern while training transformers for another project. I don't know why this is happening, but it doesn't seem to be a bug