dlpack icon indicating copy to clipboard operation
dlpack copied to clipboard

Endianness support is missing

Open leofang opened this issue 2 years ago • 9 comments

Complaints on endianness has been something I've recurrently seen (ex: CuPy https://github.com/cupy/cupy/issues/3652 and mpi4py https://github.com/mpi4py/mpi4py/issues/177), and I anticipate at some point we'd start receiving bug reports on this. Apparently there is at least a few communities out there (astropy and hdf5) that prefer (or could work with) non-native (that is, big) endianness data. This causes problems if two libraries exchange but do not communicate the endianness for how to interpret the data.

I suggest two possible solutions:

  1. Add an endianness enum (big, little, native, etc) and include it in DLDataType as a new struct member:
    • Cons: This will be an API/ABI incompatible change, unfortunately.
  2. Alternatively, we could apply a bitwise mask to DLDataType::code to make it carry this information:
    • The mask should be designed such that when not applied it refers to the little endianness, the de facto standard used by all projects so far

cc: @tqchen @rgommers @tirthasheshpatel

leofang avatar Feb 28 '22 05:02 leofang

FITS as an astrophysics-specific big-endian format is annoying indeed. Your mpi4py link also seems to come from that community. I can't think of many other fields where this comes up, but it's hard to be sure.

A few thoughts:

  • Ignoring it completely yields correctness issues; transparently converting to native endianness in I/O routines as CuPy just did makes sense.
  • Right now, DLPack implementations should check for this and raise an exception on the exporter side. Not doing so looks like a pretty severe bug to me.
  • Solution 1 above: there's a longer list of info we'd like to see captures (like readonly), when that is done this could be taken along - it's not an extra/separate ABI break.
  • Solution 2 above: not a fan of such an obscure solution.
  • It's not clear to me that either solution will be generally helpful. If the vast majority of libraries supporting DLPack do not support non-native byteorder, then an exception will still be raised on the importer side once endianness info is present. Hence not much change compared to the current state where the exporter must raise. So it's probably okay to add if it can be taken along in a larger set of changes, but is not worth a separate (breaking or convoluted) change.

rgommers avatar Feb 28 '22 09:02 rgommers

  • It's not clear to me that either solution will be generally helpful. If the vast majority of libraries supporting DLPack do not support non-native byteorder, then an exception will still be raised on the importer side once endianness info is present. Hence not much change compared to the current state where the exporter must raise. So it's probably okay to add if it can be taken along in a larger set of changes, but is not worth a separate (breaking or convoluted) change

I agree with this. If and when we get to adding something like readonly, we can also add this along with it.

tirthasheshpatel avatar Feb 28 '22 09:02 tirthasheshpatel

While non-native byteorder is useful for serialization and less for in-memory exchange.

As of now the implicit assumption seems to be that the data should always follow the native-byteorder in the system. So perhaps one possible actionable item

  • A2: confirm this fact(that data field should follow the machine native byteorder) and document it as part of the standard. Note that given almost all libraries operate under the native-order for computation.

We do need to acknowledge that for serialization/networking having a fixed(non-machine native) endian is useful. I personally would consider that to be outside the scope(as the main purpose is fast in-memory exchange for computing). We can always run explicit byteswaping(which is needed anyway as most frameworks need that) in the scenario of serialization.

tqchen avatar Feb 28 '22 14:02 tqchen

We do need to acknowledge that for serialization/networking having a fixed(non-machine native) endian is useful. I personally would consider that to be outside the scope(as the main purpose is fast in-memory exchange for computing). We can always run explicit byteswaping(which is needed anyway as most frameworks need that) in the scenario of serialization.

I think the question here is whether to automatically byteswap or not. If users get FITS or HDF5 file with non-native byteorder and byteswapping is not done upon loading that data into memory (as is the case now), then there are two options:

  1. Byteswapping is done automatically upon exporting a DLPack capsule,
  2. The exporter raises an informative error and asks the user to byteswap.

I think I have a slight preference for (2).

rgommers avatar Mar 03 '22 14:03 rgommers

I also prefer (2) as it simpler

tqchen avatar Mar 03 '22 17:03 tqchen

I think I have a slight preference for (2)

That's what NumPy does right now:

>>> import numpy as np
>>> x = np.array([1,2,3], dtype='>i2')
>>> np.from_dlpack(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: DLPack only supports native byte swapping.

And what others should also do IMO. I don't think it is unreasonable/unintuitive to ask the user to first convert the data to native byte ordering before exporting. So, +1 for (2)

tirthasheshpatel avatar Mar 03 '22 17:03 tirthasheshpatel

Thanks for the discussion and the consensus during my absence. I agree with you all that

  • The endianness info can be piggybacked in a larger change
  • For now the exporter should raise if the tensor/array is not in little endianness to ask users to swap themselves

leofang avatar Mar 18 '22 15:03 leofang

cc: @seberg (in case you have other thoughts since you're playing with endianness 🙂)

I think this issue can be closed by a simple doc update, since no change is required on the DLPack side. If no one else is interested in submitting a PR before the end of this month, I'll give it a shot.

leofang avatar Jan 17 '23 03:01 leofang

No opinion, to me it seems like probably a valid but low-prio addition. And it is simple in that if you flag it on the dtype you get the "unsupported dtype" error for free.

seberg avatar Jan 17 '23 08:01 seberg