djl
djl copied to clipboard
Seems like StanfordQuestionAnsweringDataset is not compatible with public Stanford json dataset
Description
I am trying to train bert model with Stanford question answering dataset and my training fails, because some questions don't have "answers" defined, but they have "plausible_answers" node. So I've created own dataset loader based on StanfordQuestionAnsweringDataset with this diff on line 156 in prepare method 👇
// iterate through the answers
answers = (List<Map<String, Object>>) question.get("answers");
if (answers.isEmpty()) {
answers = (List<Map<String, Object>>) question.get("plausible_answers");
}
But I am still not able to complete training of model, I use pre-trained model 👇
Criteria<NDList, NDList> criteria = Criteria.builder()
.optApplication(Application.NLP.QUESTION_ANSWER)
.setTypes(BertToken.class, String[].class)
.optFilter("backbone", "bert")
.setTypes(NDList.class, NDList.class)
.optEngine(Engine.getDefaultEngineName())
.optProgress(new ProgressBar())
.build();
and my trining looks like this 👇
try (Model preTrainedModel = criteria.loadModel()) {
try (Model model = Model.newInstance("BertQA")) {
model.setBlock(preTrainedModel.getBlock());
Shape encoderInputShape = new Shape(1, 2);
train(model, trainDataset, testDataset, encoderInputShape, epoch);
}
}
private void train(Model model, Dataset trainDataset, Dataset testDataset, Shape shape, int epoch) throws TranslateException, IOException {
SaveModelTrainingListener listener = new SaveModelTrainingListener("build/model");
DefaultTrainingConfig config = getTrainingConfig(listener);
try (Trainer trainer = model.newTrainer(config)) {
trainer.setMetrics(new Metrics());
trainer.initialize(shape);
EasyTrain.fit(trainer, epoch, trainDataset, testDataset);
System.out.println(trainer.getTrainingResult());
}
}
private static DefaultTrainingConfig getTrainingConfig(SaveModelTrainingListener listener) {
return new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss())
.addEvaluator(new Accuracy())
.optDevices(Engine.getInstance().getDevices(1))
.addTrainingListeners(TrainingListener.Defaults.logging("build/model"))
.addTrainingListeners(listener);
}
and I get always 👇
File "code/torch/torch/nn/modules/module/___torch_mangle_1.py", line 8, in forward def forward(self: torch.torch.nn.modules.module.___torch_mangle_1.Module, input: Tensor) -> Tensor: token_type_embeddings = torch.embedding(self.weight, input, -1, False, False) ~~~~~~~~~~~~~~~ <--- HERE return token_type_embeddings
Traceback of TorchScript, original code (most recent call last):
/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/functional.py(1484): embedding
/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/sparse.py(114): forward
/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(516): _slow_forward
/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(530): call
/home/ubuntu/.local/lib/python3.6/site-packages/transformers/modeling_bert.py(175): forward
/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(516): _slow_forward
/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(530): call
/home/ubuntu/.local/lib/python3.6/site-packages/transformers/modeling_bert.py(783): forward
/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(516): _slow_forward
/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(530): call
/home/ubuntu/.local/lib/python3.6/site-packages/transformers/modeling_bert.py(1480): forward
/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(516): _slow_forward
/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(530): call
/home/ubuntu/.local/lib/python3.6/site-packages/torch/jit/init.py(1034): trace_module
/home/ubuntu/.local/lib/python3.6/site-packages/torch/jit/init.py(882): trace
bert.py(25):
so it looks to me that I have some mismatch with input of model, can I somehow see how model is defined or something to help me investigate it, thanks