dlang.org Fix 23186 - wchar/dchar do not have their endianess defined

Fix 23186 - wchar/dchar do not have their endianess defined

Open dkorpel opened this issue 2 years ago • 8 comments

@rikkimax

Jun 16 '22 09:06 dkorpel

Thanks for your pull request and interest in making D better, @dkorpel! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close	Bugzilla	Severity	Description
✓	23186	enhancement	wchar/dchar do not have their endianess defined

Jun 16 '22 09:06 dlang-bot

I'm being pedantic, but I think leaving this detail for abi.dd is enough.

I think so too, but @rikkimax says "this isn't an ABI thing, it's about encodings" https://issues.dlang.org/show_bug.cgi?id=23186#c2

Jun 16 '22 14:06 dkorpel

I'm being pedantic, but I think leaving this detail for abi.dd is enough.

I think so too, but @rikkimax says "this isn't an ABI thing, it's about encodings" https://issues.dlang.org/show_bug.cgi?id=23186#c2

With Unicode there are three different encodings associated with it, for each code point size. Big/Little/Unknown endian.

Basically, you can't read in a file, read a pipe or socket. and cast it to the respective string type.

But now that you mention it @ibuclaw we should be defining float/double in terms of binary vs decimal. Because they are not interchangeable and both can be 32/64bit.

Jun 16 '22 18:06 rikkimax

I'm being pedantic, but I think leaving this detail for abi.dd is enough.

I think so too, but @rikkimax says "this isn't an ABI thing, it's about encodings" https://issues.dlang.org/show_bug.cgi?id=23186#c2

With Unicode there are three different encodings associated with it, for each code point size. Big/Little/Unknown endian.

Basically, you can't read in a file, read a pipe or socket. and cast it to the respective string type.

But these are determined by the BOM at the beginning of the file/stream, not the ABI of the target if I understand you then. So is there really any point in giving endianess a mention then?

But now that you mention it @ibuclaw we should be defining float/double in terms of binary vs decimal. Because they are not interchangeable and both can be 32/64bit.

They will always be binary as we don't have any specialization of floating point types.

Jun 20 '22 07:06 ibuclaw

But these are determined by the BOM at the beginning of the file/stream, not the ABI of the target if I understand you then. So is there really any point in giving endianess a mention then?

Yes, because at least we have documented that you can cast when the endian matches, otherwise you have to do the conversion.

It also means that string literals have their endianness defined. Since you can't copy the saved output of a string literal to file ext., and read it on another and assume it will be correct.

But now that you mention it @ibuclaw we should be defining float/double in terms of binary vs decimal. Because they are not interchangeable and both can be 32/64bit.

They will always be binary as we don't have any specialization of floating point types.

Which could be a bit of a problem moving forward if C does indeed get decimal floats ;)

Jun 20 '22 07:06 rikkimax

But these are determined by the BOM at the beginning of the file/stream, not the ABI of the target if I understand you then. So is there really any point in giving endianess a mention then?

Yes, because at least we have documented that you can cast when the endian matches, otherwise you have to do the conversion.

It also means that string literals have their endianness defined. Since you can't copy the saved output of a string literal to file ext., and read it on another and assume it will be correct.

String literals are 1-byte, so there are no distinctions between endianess there. Wide literals follow native endianess, which matches the underlying integer types.

Conversion to/from native is a library matter.

But now that you mention it @ibuclaw we should be defining float/double in terms of binary vs decimal. Because they are not interchangeable and both can be 32/64bit.

They will always be binary as we don't have any specialization of floating point types.

Which could be a bit of a problem moving forward if C does indeed get decimal floats ;)

They technically do have them, as well as fixed point, for decades now.

Jun 20 '22 08:06 ibuclaw

Anyway, just the mention in spec/abi is enough here. Maybe the spec/abi/endianess could be expanded further, but I don't have any specific rewording to offer off the top of my head. Phobos documentation should be looked at as well where it is required for users to be mindful of endianess of input/output wide streams.

Jun 20 '22 08:06 ibuclaw

Conversion to/from native is a library matter.

Indeed, the best we can do is to acknowledge when it needs to happen.

They technically do have them, as well as fixed point, for decades now.

At least with compiler-specific extensions, you can kinda ignore them when it comes to describing D's support for C.

If the things that look like they are going into C23, go in, we won't be able to describe D's support for C the same way we do now. Too much stuff we won't be able to represent 1:1 that we should be able to.

Jun 20 '22 08:06 rikkimax

dlang.org dlang.org copied to clipboard

Fix 23186 - wchar/dchar do not have their endianess defined

Bugzilla references

dlang.org
dlang.org copied to clipboard