Amir Plivatsky comments

Results 369 comments of


                                            Amir Plivatsky

Stripping affix-class tokens

> > Maybe QUOTES and BALLETS should be regexes. The tokenizer can check if the previous token matches them. Consider this (see QUOTES and BULLETS at the end of `is_capitalizable()`):...

Stripping affix-class tokens

Or we can ditch QUOTES and BULLETS altogether and use a single name, say CAPSTART: ``` % quotes " " « »《》【】『』 ` „ “...

Stripping affix-class tokens

It seems to me it would be a major achievement if an unsupervised algo could find the equivalence of words that start with capital letters to those that are not,...

Stripping affix-class tokens

Is it reasonable to check the existence of the affix tokens in case of a DB/Atomese dict? I guess this check should be skipped in that case.

I still get this with the `en` dict: ``` link-grammar: Warning: afdict_init: Class LPUNC in file en/4.0.affix: Token "``" not in the dictionary! link-grammar: Warning: afdict_init: Class LPUNC in file...

Stripping affix-class tokens

In the PR I'm finishing now, I made the message on non-existent strippable affixes to be only a warning. I also changed it to one long line per affix class...

Stripping affix-class tokens

> you cn add that, if desired. I looked at it and it seems I can add them in 4.0.dict since only the files it includes are generated. I will...

A "crazy" behaviour in counting linkages...

There is already a part of sane-morphism check in `form_match_list()`. But it is able to check only one of the conditions that leads to "insane-morphism". In do_count(), maybe another type...

The `vn` dictionary

(Continuing the discussion from PR #1201.) > Beats me. I looked at the second sentence: "tôi mua một bông hoa" and two of the four words are not in the...

The `vn` dictionary

> The vietnamese dict came from here: https://www.researchgate.net/publication/287444370_Parsing_complex_-_compound_sentences_with_an_extension_of_Vietnamese_link_parser_combined_with_discourse_segmenter A copy of this article (same publication) is accessible through the link in the link-grammar Wikipedia page. Interestingly, the connector strings they...