dlang.org icon indicating copy to clipboard operation
dlang.org copied to clipboard

Fix 23186 - wchar/dchar do not have their endianess defined

Open dkorpel opened this issue 2 years ago • 8 comments

@rikkimax

dkorpel avatar Jun 16 '22 09:06 dkorpel

Thanks for your pull request and interest in making D better, @dkorpel! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
23186 enhancement wchar/dchar do not have their endianess defined

dlang-bot avatar Jun 16 '22 09:06 dlang-bot

I'm being pedantic, but I think leaving this detail for abi.dd is enough.

I think so too, but @rikkimax says "this isn't an ABI thing, it's about encodings" https://issues.dlang.org/show_bug.cgi?id=23186#c2

dkorpel avatar Jun 16 '22 14:06 dkorpel

I'm being pedantic, but I think leaving this detail for abi.dd is enough.

I think so too, but @rikkimax says "this isn't an ABI thing, it's about encodings" https://issues.dlang.org/show_bug.cgi?id=23186#c2

With Unicode there are three different encodings associated with it, for each code point size. Big/Little/Unknown endian.

Basically, you can't read in a file, read a pipe or socket. and cast it to the respective string type.

But now that you mention it @ibuclaw we should be defining float/double in terms of binary vs decimal. Because they are not interchangeable and both can be 32/64bit.

rikkimax avatar Jun 16 '22 18:06 rikkimax

I'm being pedantic, but I think leaving this detail for abi.dd is enough.

I think so too, but @rikkimax says "this isn't an ABI thing, it's about encodings" https://issues.dlang.org/show_bug.cgi?id=23186#c2

With Unicode there are three different encodings associated with it, for each code point size. Big/Little/Unknown endian.

Basically, you can't read in a file, read a pipe or socket. and cast it to the respective string type.

But these are determined by the BOM at the beginning of the file/stream, not the ABI of the target if I understand you then. So is there really any point in giving endianess a mention then?

But now that you mention it @ibuclaw we should be defining float/double in terms of binary vs decimal. Because they are not interchangeable and both can be 32/64bit.

They will always be binary as we don't have any specialization of floating point types.

ibuclaw avatar Jun 20 '22 07:06 ibuclaw

But these are determined by the BOM at the beginning of the file/stream, not the ABI of the target if I understand you then. So is there really any point in giving endianess a mention then?

Yes, because at least we have documented that you can cast when the endian matches, otherwise you have to do the conversion.

It also means that string literals have their endianness defined. Since you can't copy the saved output of a string literal to file ext., and read it on another and assume it will be correct.

But now that you mention it @ibuclaw we should be defining float/double in terms of binary vs decimal. Because they are not interchangeable and both can be 32/64bit.

They will always be binary as we don't have any specialization of floating point types.

Which could be a bit of a problem moving forward if C does indeed get decimal floats ;)

rikkimax avatar Jun 20 '22 07:06 rikkimax

But these are determined by the BOM at the beginning of the file/stream, not the ABI of the target if I understand you then. So is there really any point in giving endianess a mention then?

Yes, because at least we have documented that you can cast when the endian matches, otherwise you have to do the conversion.

It also means that string literals have their endianness defined. Since you can't copy the saved output of a string literal to file ext., and read it on another and assume it will be correct.

String literals are 1-byte, so there are no distinctions between endianess there. Wide literals follow native endianess, which matches the underlying integer types.

Conversion to/from native is a library matter.

But now that you mention it @ibuclaw we should be defining float/double in terms of binary vs decimal. Because they are not interchangeable and both can be 32/64bit.

They will always be binary as we don't have any specialization of floating point types.

Which could be a bit of a problem moving forward if C does indeed get decimal floats ;)

They technically do have them, as well as fixed point, for decades now.

ibuclaw avatar Jun 20 '22 08:06 ibuclaw

Anyway, just the mention in spec/abi is enough here. Maybe the spec/abi/endianess could be expanded further, but I don't have any specific rewording to offer off the top of my head. Phobos documentation should be looked at as well where it is required for users to be mindful of endianess of input/output wide streams.

ibuclaw avatar Jun 20 '22 08:06 ibuclaw

Conversion to/from native is a library matter.

Indeed, the best we can do is to acknowledge when it needs to happen.

They technically do have them, as well as fixed point, for decades now.

At least with compiler-specific extensions, you can kinda ignore them when it comes to describing D's support for C.

If the things that look like they are going into C23, go in, we won't be able to describe D's support for C the same way we do now. Too much stuff we won't be able to represent 1:1 that we should be able to.

rikkimax avatar Jun 20 '22 08:06 rikkimax