Support of byUTF for ubyte[] argument
For the reasons outlined in the discussion of that pull request, we concluded that we need to be able to call byUTF on the argument of type ubyte[]. This PR implements exactly that.
I remind that this is necessary:
- to support converting a chunk extracted from a file;
- to eliminate the need to validate a string of
chars two times: when it is created and when converted bybyUTF(this simplifies programming and improves efficiency).
Thanks for your pull request and interest in making D better, @vporton! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:
- My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
- My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
- I have provided a detailed rationale explaining my changes
- New or modified functions have Ddoc comments (with
Params:andReturns:)
Please see CONTRIBUTING.md for more information.
If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.
Bugzilla references
Your PR doesn't reference any Bugzilla issue.
If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.
Testing this PR locally
If you don't have a local development environment setup, you can use Digger to test this PR:
dub fetch digger
dub run digger -- build "master + phobos#7249"
What happens for a range of ubytes that aren't valid UTF-8? Is this covered by a test?
I read the discussion on the other PR and AIUI it called for something that took ubyte[] and lazily produced validaded char[]. This doesn't seem to be it. Could you please explain what you're trying to accomplish? Thanks.
I read the discussion on the other PR and AIUI it called for something that took
ubyte[]and lazily produced validadedchar[]. This doesn't seem to be it. Could you please explain what you're trying to accomplish? Thanks.
The added unittest explains it:
assert((cast(ubyte[]) [0x68, 0x65, 0x6c, 0x6c, 0xC3, 0xB6]).byUTF!char().equal(['h', 'e', 'l', 'l', 0xC3, 0xB6]));
You pass in a range of ubyte and get a range of char. Handy, as no separate step to deal with autodecoding is needed.
Perhaps this should also accept ushort (assumed to be UTF-16) and uint (UTF-32)?
Of course, this can be implemented later on just as well.
ping @vporton