PICK-pytorch icon indicating copy to clipboard operation
PICK-pytorch copied to clipboard

CRF decoding method ?

Open ziodos opened this issue 3 years ago • 5 comments

does the crf performs sequence labeling in token level or character level ?

ziodos avatar Jun 03 '21 08:06 ziodos

does the crf performs sequence labeling in token level or character level ?

character level

AtulKumar4 avatar Jun 05 '21 18:06 AtulKumar4

@AtulKumar4 thanks for the answer , can you please provide more informations about the format of IOB labels if the model perfroms sequence tagging in character level ?

ziodos avatar Jun 08 '21 18:06 ziodos

@ziodos If you have a class name 'food' then iob format will like this:- Example:- I will order french fries

o oooo ooooo B-food i-food i-food i-food i-food i-food B-food i-food i-food i-food i-food Hope this will help.

AtulKumar4 avatar Jun 09 '21 06:06 AtulKumar4

yes thank you so much, I still have confusion .. in the code they used both word and character embedding , so from what I understood the BiLSTM layer accecpts as input the character embedding plus the word embedding to where it belongs .. for example if we are treating the word "Total" , the character "T" is represented by it's caracter embedding plus the word "Total" embedding, is that right ?

ziodos avatar Jun 09 '21 13:06 ziodos

@ziodos if you read the decoder part in the paper. They have mentioned "Union layer receives the input X ∈ R N×T ×dmodel having variable length T generated from Encoder means each word have a different number of characters, then packs padded input sequences and fill padding value at the end of sequence yielding packed sequence Xˆ ∈ R (N·T)×dmodel".

Then this output will get combined with the graph embedding before feeding into CRF layer. Example:-Rough Idea Total -> (5+ pad)* (embedding vector length) Union (Graph embedding) 5+pad = maximum length of the sequence

AtulKumar4 avatar Jun 09 '21 17:06 AtulKumar4