talisman icon indicating copy to clipboard operation
talisman copied to clipboard

Parts of Speech tagging?

Open giorgio79 opened this issue 6 years ago • 5 comments

Would love to do to POS tagging with this lib Maybe integrate with others? https://github.com/FinNLP/en-pos

giorgio79 avatar May 28 '18 14:05 giorgio79

Hello @giorgio79. There is an experimental version of the averaged perceptron used by spacy here. It's undocumented but it should work. On a side note, I am currently thinking of refocusing of fuzzy matching/clustering with this library and drop hard NLP tasks because I don't have much time. But I'd love to speak with you about what you thinks you'd prefer use this lib to perform POS tagging rather than using the one you mention here.

Yomguithereal avatar May 28 '18 14:05 Yomguithereal

Thx @Yomguithereal ! Js nlp libs are ripening super fast, I am currently evaluating myself the options, such as

  • https://github.com/spencermountain/compromise
  • https://github.com/NaturalNode/natural
  • https://github.com/Ulflander/compendium-js and some more :)

Joining forces would be a great way forward to avoid duplicated efforts. Have you thought of combining with some of the others? Otherwise, doing spacy in javascript sounds fantastic, but as you say a massive undertaking. At the moment, Natural seems to do a lot that I need already, and I just thought I give a quick go to others like Talisman.

giorgio79 avatar May 28 '18 14:05 giorgio79

As much as I'd love to add my stone to js's hard nlp libraries I feel that my edge is much more fuzzy matching/clustering unfortunately. Google Refine-like stuff for instance & custom search engines.

Yomguithereal avatar May 28 '18 14:05 Yomguithereal

Basically, my strategy for the future will probably to drop pos tagging / machine learning classifiers stuff and focus on fuzzy clustering, distance metrics, keyers, phonetic algorithms, stemmers, and tokenizers. But I'd be willing to help other libraries scavenge what they could use from me related to nlp such as the pos tagger, sentence tokenizer (punkt notably).

Yomguithereal avatar May 28 '18 14:05 Yomguithereal

Yeah, avoid reinventing the wheel where possible. Eg NaturalNode has tons of tokenizers already here https://github.com/NaturalNode/natural#tokenizers

giorgio79 avatar May 28 '18 14:05 giorgio79