awesome-align icon indicating copy to clipboard operation
awesome-align copied to clipboard

Is it possible to incorporate POS tag info to aid alignment?

Open stelmath opened this issue 1 year ago • 1 comments

Hello and many thanks for sharing the project

I have an open question/discussion: would it be possible to incorporate the POS information of each token during training? For example, by having a new loss function that tries to minimize POS tag mismatching from source to target token. This comes from the idea that if a source token is a Noun in the source language, it will most likely also be a Noun in the target language. Same would go for Verbs etc. or other high-level POS tags. What are your thoughts on this?

Thank you

stelmath avatar Mar 02 '23 10:03 stelmath

Hello, thank you for the suggestion! yes I think incorporating the POS tag information into training may improve the model performance. maybe you can start by doing this at inference time (e.g. enforcing the extracted aligned word pairs to have the same POS) and see if the results can improve, then investigate potential training objectives.

zdou0830 avatar Mar 05 '23 04:03 zdou0830