Amir Plivatsky

Results 369 comments of Amir Plivatsky

> > Maybe QUOTES and BALLETS should be regexes. The tokenizer can check if the previous token matches them. Consider this (see QUOTES and BULLETS at the end of `is_capitalizable()`):...

Or we can ditch QUOTES and BULLETS altogether and use a single name, say CAPSTART: ``` % quotes " " « »《 》 【 】 『 』 ` „ “...

It seems to me it would be a major achievement if an unsupervised algo could find the equivalence of words that start with capital letters to those that are not,...

Is it reasonable to check the existence of the affix tokens in case of a DB/Atomese dict? I guess this check should be skipped in that case.

I still get this with the `en` dict: ``` link-grammar: Warning: afdict_init: Class LPUNC in file en/4.0.affix: Token "``" not in the dictionary! link-grammar: Warning: afdict_init: Class LPUNC in file...

In the PR I'm finishing now, I made the message on non-existent strippable affixes to be only a warning. I also changed it to one long line per affix class...

> you cn add that, if desired. I looked at it and it seems I can add them in 4.0.dict since only the files it includes are generated. I will...

There is already a part of sane-morphism check in `form_match_list()`. But it is able to check only one of the conditions that leads to "insane-morphism". In do_count(), maybe another type...

(Continuing the discussion from PR #1201.) > Beats me. I looked at the second sentence: "tôi mua một bông hoa" and two of the four words are not in the...

> The vietnamese dict came from here: https://www.researchgate.net/publication/287444370_Parsing_complex_-_compound_sentences_with_an_extension_of_Vietnamese_link_parser_combined_with_discourse_segmenter A copy of this article (same publication) is accessible through the link in the link-grammar Wikipedia page. Interestingly, the connector strings they...