John Cheung
John Cheung
> compose the dashed word features from their component parts I think that can be a possible way for doing it (a similar technique is implemented in tagger, see https://github.com/dhowe/ritajs/blob/b5b447c300739928aabb7f6f83577493695a5512/src/tagger.js#L495...
I made an implementation for analysing hyphenated word as one (above PR). The performance is not bad (not causing exec time warning in larger test pool, take ~40ms to execute...
> can you summarize me what you did here? So basically this algorithm treat hyphenated word as a sort of "phrase": breaks it down to parts and treat each part...
>also, we need to fix tests like this (should be 4 syllables): `eq(feats["syllables"], "s-t-ey-t-ah-v-dh-ah-aa-r-t");` just to confirm, the correct output should be `s-t-ey-t/ah-v/dh-ah/aa-r-t` ?
Now has 75 tests in 4 pools: - pool1 : all parts in lexicon - poo2A: some parts not in lexicon but are variants of words in lexicon - poo2B:...
sure, I will finish the tests for hyphenated words in sentences for tagger first. then sync
replaced it with `.replace(/([a-zA-Z]+)-([a-zA-Z]+)/g, "$1 - $2");` should work on all browsers now (maybe not IE...)
hmm > adding 'nn' as a 2nd tag for 'there' in the lexicon that doesn't work... I think actually 'there' should be tagged as 'rb' most frequently? like - "She...
Sorry, in the deleted comment I forgot to check if the word is also correctly tagged here is the past part that need to be added `const IRREG_PAST_PART_NOT_IN_DICT = ["abode","begotten","bidden","borne","chlung","could","mown","pled","relaid","shod","smelt","spelt","spolit","taight","wrung"];`...
I generate the list simply by going over `IRREG_PAST_PART` in conjugator.js and checking if lexicon has that entry and if it is tagged as `vbd` or `vbn`. I will check...