flair icon indicating copy to clipboard operation
flair copied to clipboard

LayoutLM and LayoutLMv2 + Flair?

Open dardodel opened this issue 1 year ago • 0 comments

Hi, I saw related topic here.
I tried to bring pre-trained (as well as fine-tuned) LayoutLMv2 embeddings into Flair but was not successful as expected. I tried to mimic "class BertEmbeddings(TokenEmbeddings)" in legacy.py and I checked the embeddings (last layer hidden states) manually and it looks the embedding of each Token of each Sentence is correctly assigned. But, the model does not train well. With the same settings, BERT (or RoBERTa) trains very well but not LayoutLMv2. Here are settings:

embedding_types: List[TokenEmbeddings] = [ LayoutLMv2Embeddings("./LayoutLMv2_Pretrained", layers = "-1")]

** I created LayoutLMv2Embeddings class

tagger: SequenceTagger = SequenceTagger(hidden_size=512, embeddings=embeddings, tag_dictionary=tag_dictionary, tag_type=tag_type, use_crf=True, use_rnn=True)

trainer.train('./models/nerTL_io_BERT_crf/', learning_rate=0.1, optimizer = Adam, mini_batch_size=8, max_epochs=150, patience=4, anneal_factor=0.25, min_learning_rate=0.000025, monitor_test=False, embeddings_storage_mode="cpu" )

Here are a summary of my experiences:

  1. In general, the training behavior is like when the embeddings don't have enough information for classification. After Training, the model learns to predict everything as the dominant class (which is class "O"). To me, that means not enough information is found in the embeddings (inputs). With some specific learning settings (see bullet 2), I was able to improve it and not predict everything as O.

  2. The LayoutLMv2 does not even train without using Adam optimizer with LR ~ 0.001, patience=8 (a number larger than 4), using CRF, and BiLSTM layer. I mean the model even doesn't overfit. With the settings mentioned in this bullet, LayoutLMv2 trains but after some epochs the Train and Tess Loss do not go down. With some special learning settings, I was able to overfit the model (Train loss going down, while test loss increasing).

  3. I tried the warmup steps for LR but did not help.

  4. In the best case, the final F1-Score for LayoutLMv2 is below 30-40 but that of BERT is above 80-95. When I fine-tune LayoutLMv2 (not with Flair), I got F1-Score around 90.

Any clue? I can provide more information if helps. Does Flair team have a plan to bring models such LayoutLMv2 into Flair?

Thanks

dardodel avatar Aug 04 '22 15:08 dardodel