cht-core
cht-core copied to clipboard
Zero-width unicode characters removed from SMS messages too often
Describe the bug The fix introduced in https://github.com/medic/cht-core/issues/7654 is too aggressive and strips unicode characters from the entire message rather than just the parts needed for matching.
From @binokaryg
One minor caveat of this change is that if some text deliberately has zero-width characters, they would also be stripped out. These characters are very rarely used in general Nepali texting and the meaning and pronunciation remain the same, with or without them. For our use case, it might only be the name field that is potentially altered.
To Reproduce
Steps to reproduce the behavior:
- Register a new patient via SMS with the name
र्याले
- See the name is changed to
र्याले
in the database
Also make sure we don't regress on the fix to #7654
Expected behavior The name is a freetext field so should remain unchanged.
Environment
- Instance: localhost
- Browser: any
- Client platform: any
- App: api
- Version: 3.16.0
Additional context Add any other context about the problem here. What have you tried? Is there a workaround?
This is ready for AT on 7676-more-selective-stripping
Config: Standard Environment: Local with docker helper script Platform: WebApp Browser: Chrome
Reproducible on Master
After adding a new person via SMS (N form) with the name र्याले
, the person is created successfully but its name is changed to र्याले
in the database
Test images
Fixed on 7676-more-selective-stripping
After adding a new person via SMS (N form) with the name र्याले
, the person is created successfully and its name was saved correctly.
Test images
Test #7654
Not sure about this test, I had some problems uploading the Sunsari config and creating the place. @garethbowen please confirm if the test is correct.
Merged!