Hong Lu
Hong Lu
Does this mean the learning rate decay was only applied to the 24 transformer layers? Not to the embedding layers or the dense layers for start and end logits? I'm...
I thought #487 is about fixing the temp directory usage in the tests. This is a notebook and it's already using a TemporaryDirectory for both data loading and SequenceClassifier initialization....
@miguelgfierro This is not related to the tests. This is just about running the notebook locally. I think Daisy meant the TemporaryDirectory needs to be explicitly deleted at the end...
@kehuangms Did you verify the cleanup happened as expected? I think the reason @daden-ms created the issues is the cleanup didn't happen.
@atakanokan Yes. It's possible to have custom entity labels. It's like a muti-class classification problem, the model can handle any labels exist in the training data. If you have your...