happy-transformer icon indicating copy to clipboard operation
happy-transformer copied to clipboard

HI, sorry but your grammar model is very weak on real cases

Open caprone opened this issue 2 years ago • 4 comments

HI I tested on some simple sentences, like:

args = TTSettings(num_beams=5, min_length=1, max_length=5000) happy_grammar = HappyTextToText("T5", "vennify/t5-base-grammar-correction") result = happy_grammar.generate_text('This mvie did everUthing a van dqmme mLvie shoHld do Ahich is martial arts and action', args=args)

your model result : " 'This mvie did everthing a van dqmme mLvie shoHld do, which is martial arts and action.'

what's your grammar model purpose????? is it a joke?....mah....

caprone avatar Jul 09 '22 19:07 caprone

Please run the output "This mvie did everthing a van dqmme mLvie shoHld do, which is martial arts and action" as input again to see if it progressively fixes it. Just a suggestion (not tested). Yes, serious stuff usually start off like a joke!

Sukii avatar Jul 10 '22 08:07 Sukii

HI and thanks for your help.. yes, some gain but very negligible... But I found that the simple Spellcheker library, that use Levenshtein distance and a base statistic, works great on misspelling...probably this transofmers based grammar, was trained mainly on grammatical errors; I also tried "contextualSpellCheck" package, based on Spacy, and it works even worse on mispelling and also on grammar than this "grammar corrector".. then, for the moment, I found that good solution is --pipe-- of: cleaning, spellcheker, happy-grammar...

caprone avatar Jul 10 '22 13:07 caprone

Looks more like QWERTY keyboard spelling mistake. Probably we need a QWERTY-distance based rather than Levenshtein distance based spellcheck

Sukii avatar Jul 10 '22 19:07 Sukii

HI, not...is only random noise add to text :)

caprone avatar Jul 10 '22 19:07 caprone

Since Transformer models look at things on a token-by-token basis (1 token ~= 1 word for common words), we don't expect them to be good at this task (mis-spelled words)

ted537 avatar Nov 10 '22 22:11 ted537