cht-core Zero-width unicode characters removed from SMS messages too often

Zero-width unicode characters removed from SMS messages too often

Open garethbowen opened this issue 2 years ago • 2 comments

Describe the bug The fix introduced in https://github.com/medic/cht-core/issues/7654 is too aggressive and strips unicode characters from the entire message rather than just the parts needed for matching.

From @binokaryg

One minor caveat of this change is that if some text deliberately has zero-width characters, they would also be stripped out. These characters are very rarely used in general Nepali texting and the meaning and pronunciation remain the same, with or without them. For our use case, it might only be the name field that is potentially altered.

To Reproduce

Steps to reproduce the behavior:

Register a new patient via SMS with the name र्‍याले
See the name is changed to र्याले in the database

Also make sure we don't regress on the fix to #7654

Expected behavior The name is a freetext field so should remain unchanged.

Environment

Instance: localhost
Browser: any
Client platform: any
App: api
Version: 3.16.0

Additional context Add any other context about the problem here. What have you tried? Is there a workaround?

Jul 13 '22 18:07 garethbowen

This is ready for AT on 7676-more-selective-stripping

Aug 11 '22 20:08 garethbowen

Config: Standard Environment: Local with docker helper script Platform: WebApp Browser: Chrome

Reproducible on `Master`

After adding a new person via SMS (N form) with the name र्‍याले, the person is created successfully but its name is changed to र्याले in the database

Test images

Fixed on `7676-more-selective-stripping`

After adding a new person via SMS (N form) with the name र्‍याले, the person is created successfully and its name was saved correctly.

Test images

Test #7654

Not sure about this test, I had some problems uploading the Sunsari config and creating the place. @garethbowen please confirm if the test is correct.

Aug 12 '22 17:08 tatilepizs

Merged!

Aug 15 '22 10:08 garethbowen

cht-core cht-core copied to clipboard

Zero-width unicode characters removed from SMS messages too often

Reproducible on Master

Fixed on 7676-more-selective-stripping

cht-core
cht-core copied to clipboard

Reproducible on `Master`

Fixed on `7676-more-selective-stripping`