sacremoses icon indicating copy to clipboard operation
sacremoses copied to clipboard

Is there a plan to have sent_tokenize in this library?

Open jeremyasapp opened this issue 4 years ago • 2 comments

Thanks for the awesome work porting this in a separate library. Makes it a great choice for people looking at a light library for tokenization / detokenization.

Was wondering if there was a plan to port sent_tokenize? It's in the repo but looks deprecated?

jeremyasapp avatar Jun 03 '20 16:06 jeremyasapp

Actually that sent_tokenize is a can of worms thus the reluctance to complete the code =)

I'm a little pack these couple of days but let me see if I can sit down and hack up a new version of the sent_tokenize and take into consideration other sentence tokenizers that are available in the while.

alvations avatar Jun 04 '20 00:06 alvations

That'd be fantastic! :)

jeremyasapp avatar Jun 04 '20 03:06 jeremyasapp