Handle unicode strings when parsing hashtags, links, mentions
Description
When the user posts text that contains non-ascii characters (süch às 💭), the simple ASCII-based range finders flake right out. The issue I created provides examples of what happened when a German did a post with my app. 😬
My update updates the hashtag regex to include more-inclusive matchers for unicode letters, and uses String.Index values, which are sensitive to multi-byte strings.
Linked Issues
#35
Type of Change
- [x] Bug Fix
- [ ] New Feature
- [ ] Documentation
Checklist:
- [ ] My code follows the ATProtoKit API Design Guidelines as well as the Swift API Design Guidelines.
- [ ] I have performed a self-review of my own code and commented it, particularly in hard-to-understand areas.
- [ ] I have made corresponding changes to the documentation.
- [ ] My changes generate no new warnings or errors in the compiler or runtime.
- [ ] My code is able to build and run on my machine.
Screenshots (if applicable)
Attach any screenshots or GIFs showcasing the changes effect.
Additional Notes
Add any other notes about the Pull Request here.
Credits
If you want to be credited in the CONTRIBUTORS file, you can fill out the form below. Please don't remove the square brackets.
- Name: Aaron Vegh
- GitHub: aaronvegh
Thanks. I'll look into that this week.
I took a look at it but forgot to make a comment. Sorry about that.
I'm okay with it for the most part, but I explicitly don't allow for force-unwrapping, even if there shouldn't be any possible way it will happen. This is just to prevent a situation where we constantly use it and end up putting it in places where there is a chance for it to happen. I understand if you feel this is excessive, but this is how I've wanted the code to look like overall: to be as safe as possible.
If you could tweak it a bit where you're safely unwrapping the code, then I'll approve the changes and it'll be in the next applicable hotfix update.
@MasterJ93 No worries; I've updated to remove the force unwraps.