Text field length validation work with international characters and emoji
Description
I started with this (great!) project, and created a simple country collection, that included a Flag emoji, and set maxLength: 1. It failed :(
The solution to this is to not use the default string.length but instead: https://gomakethings.com/emoji-are-still-weird-but-modern-browser-methods-help/
I've added this to the validation code, along with additional tests.
~~To have the Intl function work I also had to upgrade typescript and the target, so if you want me to take another path with this just let me know! For example,~~
I have added this dependency instead: https://github.com/flmnt/graphemer
- [X] I have read and understand the CONTRIBUTING.md document in this repository
Type of change
- [X] Bug fix (non-breaking change which fixes an issue)
Checklist:
- [X] I have added tests that prove my fix is effective or that my feature works
- [X] Existing test suite passes locally with my changes
- [X] I have made corresponding changes to the documentation
I love the integration tests you added for min/max length. I'm a bit cautious that this is something we should handle though.
I reviewed the flmnt/graphemer package. It isn't a huge, but the main class is a 10k line class and that may be a bit too heavy of overhead for something to handle what seems like a very edge case.
https://github.com/flmnt/graphemer/blob/master/src/Graphemer.ts We will discuss and decide, but this feels like something you need to handle in a custom validation on the fields that need it, within your own project or as a plugin instead of the core of Payload.
If we do decide to go forward with this, I'd definitly move the length setting inside an if block that only does this extra work if minLength or maxLength are being used. Here is where you're setting setting the length, which may not be used depending on the field config: https://github.com/payloadcms/payload/pull/1061/files#diff-e8686d63ee7a57e9238921baf4b95b80322b8df7d080f96d305fad65940678f8R31
Thanks for reviewing! That is only the definition of the length function (perhaps it needs a clearer name?) - it doesn't get called unless there is a maxLength or minLength.
Of course no problem at all if you want to keep the core clean, and I can move it into a plugin. However, I would say that if you are building a CMS that you want to be used internationally in a professional context you do need to eventually consider these edge cases, and these days people put emoji in text fields more often than you think 😄
Hey @cliftonc — I agree with everything you're saying and we definitely need to figure out a way to get validation to more accurately reflect international characters and emojis.
We are up for that. The only thing here that I would like to see us get around is importing the new dependency graphemer. Because validation functions are used in the browser, this package's filesize directly contributes to the JS needed to download from the admin UI, and we are pretty strict about trying to keep that bundle as small as possible. Not to mention just in general we want to keep our dependencies to a minimum as Payload gets bigger. Do you have any other ideas regarding how to better validate text length (emoji-friendly) outside of using that package? Everything else looks great!
For some reason my brain skipped fact the validation was client side.
The easier path is to just use the browser Intl functions, which I did initially (if you look at the commit history) - but to get the tests to pass you have to upgrade typescript + bump the target - neither of which felt appropriate at the time.
Why don't you close this now as you guys likely have a lot to work on that is more important, and I will throw this into a plugin so folks that want it can opt in.
Thanks!
@cliftonc yeah, depending on how aggressively we have to bump the target, and what that means for non-evergreen browsers could influence our answer here. But honestly it is not a top concern of Payload to support older browsers. I would personally much rather advocate for making quicker use of newer browser features especially with the rise of evergreen browsers.
I'm not opposed to bumping TS and upping the target. I would hate to close this too. I think it's a valid PR. Maybe we can revisit this as a team at some point soon. Maybe there's a better package that we could use with a smaller filesize.
Who knew getting the correct character count would be so difficult?
I went down a lot of different paths only to hit dead ends with every option.
- Use Intl.Segmenter like you have in f1e54b48e7f803cbd2583b3b05e701c8c89874fd: Not supported in Firefox.
- Use Intl.Segmenter with a polyfill: no small polyfill package that I could find.
- Use graphemer: I am particular concerned with this https://github.com/flmnt/graphemer/issues/7. I haven't benchmarked this, but I suspect long strings would have performance impacts.
- Find a pure JS solution using regexs or other string functions: everything I see on StackOverflow seems to limited.
- Find a different package: I can't find one that I would want to include due to size, lack of maintenance or other concerns.
I think we need to let this one pass for now and revisit again in the future—maybe Firefox will support the Segmenter API some day.
Thanks for the PR, I learned a lot on this topic!
I made a Unicode library that is much smaller and faster than graphemer. Check it out: https://github.com/cometkim/unicode-segmenter?tab=readme-ov-file#unicode-segmentergrapheme-vs
It ensures compliance with the latest Unicode data by performing tests and fuzzing against the Intl.Segmenter API.