NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

[NLP/PC] add support for capitalization classes lower (L), upper (U), capitalize (C)

Open itzsimpl opened this issue 3 years ago • 0 comments

Signed-off-by: Iztok Lebar Bajec [email protected]

What does this PR do ?

While waiting for #3819 to be finished, upgrade of current punctuation capitalization model with support for lowercasing, uppercasing, and capitalisation.

Collection: NLP/PC

Changelog

Modified the capitalisation decision from pure capitalisation, as soon as capit_label differs from noop (O), to one where the operation is based on three classes: lowercase (L), uppercase (U) and capitalize (C).

Warning: Due to the class_label previously used for capitalisation (U) and the way the decision was implemented prior to this PR, this change becomes a breaking change. Models trained prior to this PR will result in returning all caps instead of capitalising selected words. A retrain will, however, provide additional functionality.

Usage

  • You can potentially add a usage example below

PR Type:

  • [x] New Feature
  • [ ] Bugfix
  • [ ] Documentation

Who can review?

@PeganovAnton

itzsimpl avatar Jul 28 '22 13:07 itzsimpl