djl icon indicating copy to clipboard operation
djl copied to clipboard

Seems like StanfordQuestionAnsweringDataset is not compatible with public Stanford json dataset

Open havlenapetr opened this issue 2 years ago • 0 comments

Description

I am trying to train bert model with Stanford question answering dataset and my training fails, because some questions don't have "answers" defined, but they have "plausible_answers" node. So I've created own dataset loader based on StanfordQuestionAnsweringDataset with this diff on line 156 in prepare method 👇

                    // iterate through the answers
                    answers = (List<Map<String, Object>>) question.get("answers");
                    if (answers.isEmpty()) {
                        answers = (List<Map<String, Object>>) question.get("plausible_answers");
                    }

But I am still not able to complete training of model, I use pre-trained model 👇

Criteria<NDList, NDList> criteria = Criteria.builder()
                .optApplication(Application.NLP.QUESTION_ANSWER)
                .setTypes(BertToken.class, String[].class)
                .optFilter("backbone", "bert")
                .setTypes(NDList.class, NDList.class)
                .optEngine(Engine.getDefaultEngineName())
                .optProgress(new ProgressBar())
                .build();

and my trining looks like this 👇

try (Model preTrainedModel = criteria.loadModel()) {
  try (Model model = Model.newInstance("BertQA")) {
                  model.setBlock(preTrainedModel.getBlock());
                  Shape encoderInputShape = new Shape(1, 2);
                  train(model, trainDataset, testDataset, encoderInputShape, epoch);
              }
}

private void train(Model model, Dataset trainDataset, Dataset testDataset, Shape shape, int epoch) throws TranslateException, IOException {
        SaveModelTrainingListener listener = new SaveModelTrainingListener("build/model");
        DefaultTrainingConfig config = getTrainingConfig(listener);
        try (Trainer trainer = model.newTrainer(config)) {
            trainer.setMetrics(new Metrics());
            trainer.initialize(shape);
            EasyTrain.fit(trainer, epoch, trainDataset, testDataset);
            System.out.println(trainer.getTrainingResult());
        }
    }

    private static DefaultTrainingConfig getTrainingConfig(SaveModelTrainingListener listener) {
        return new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss())
                .addEvaluator(new Accuracy())
                .optDevices(Engine.getInstance().getDevices(1))
                .addTrainingListeners(TrainingListener.Defaults.logging("build/model"))
                .addTrainingListeners(listener);
    }

and I get always 👇

File "code/torch/torch/nn/modules/module/___torch_mangle_1.py", line 8, in forward def forward(self: torch.torch.nn.modules.module.___torch_mangle_1.Module, input: Tensor) -> Tensor: token_type_embeddings = torch.embedding(self.weight, input, -1, False, False) ~~~~~~~~~~~~~~~ <--- HERE return token_type_embeddings

Traceback of TorchScript, original code (most recent call last): /home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/functional.py(1484): embedding /home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/sparse.py(114): forward /home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(516): _slow_forward /home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(530): call /home/ubuntu/.local/lib/python3.6/site-packages/transformers/modeling_bert.py(175): forward /home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(516): _slow_forward /home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(530): call /home/ubuntu/.local/lib/python3.6/site-packages/transformers/modeling_bert.py(783): forward /home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(516): _slow_forward /home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(530): call /home/ubuntu/.local/lib/python3.6/site-packages/transformers/modeling_bert.py(1480): forward /home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(516): _slow_forward /home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py(530): call /home/ubuntu/.local/lib/python3.6/site-packages/torch/jit/init.py(1034): trace_module /home/ubuntu/.local/lib/python3.6/site-packages/torch/jit/init.py(882): trace bert.py(25): RuntimeError: index out of range in self

so it looks to me that I have some mismatch with input of model, can I somehow see how model is defined or something to help me investigate it, thanks

havlenapetr avatar Jun 02 '23 12:06 havlenapetr