align-linguistic-alignment
align-linguistic-alignment copied to clipboard
Python library for extracting quantitative, reproducible metrics of multi-level alignment between two speakers in naturalistic language corpora.
Some corpora (especially in child-directed and child-produced speech) come with manually checked POS tags. We might want to allow using pre-existing tags for syntactic alignment. Suggestions on code changes to...
If semantic model is pre-trained: what is the coverage of the current corpus, e.g. in terms of how many unique and total words? If semantic model is trained on corpus:...
The Stanford-NLP suite includes several additional languages (e.g. Mandarin, Arabic) and can be extended to others (e.g. Danish).
Several papers have pointed to differential patterns and effects of alignment over longer spans: - Reitter and Moore 2017: syntactic alignment at 20 turns of distance (but not shorter) predicts...
In prep phase, group the different types of POS tags and collapse them to a single tag. e.g. NN, NNS, NN* becomes NN.