Correct the buffer protocol format type for longs

Open pelson opened this issue 2 years ago • 11 comments

The docs on this are https://docs.python.org/3/c-api/buffer.html#c.Py_buffer.format and https://docs.python.org/3/library/struct.html#format-characters.

Current behaviour claims that the format is "long long", which as I understand on some platforms differs to int64 (I didn't verify this). Even so, before this change:

>>> np.array(jp.JArray(jp.JLong)([1, 2])).dtype.type
<class 'numpy.longlong'>

Whereas numpy does:

>>> np.array([1, 2], dtype=np.long).dtype.type
<class 'numpy.int64'>

Mar 18 '22 09:03 pelson

Codecov Report

Merging #1039 (1dec798) into master (0bacb9e) will decrease coverage by 1.25%. The diff coverage is 35.68%.

:exclamation: Current head 1dec798 differs from pull request most recent head 47bc0cf. Consider uploading reports for the commit 47bc0cf to get more accurate results

@@            Coverage Diff             @@
##           master    #1039      +/-   ##
==========================================
- Coverage   89.05%   87.80%   -1.26%     
==========================================
  Files         112      112              
  Lines       10088    10242     +154     
  Branches     3992     4022      +30     
==========================================
+ Hits         8984     8993       +9     
- Misses        663      801     +138     
- Partials      441      448       +7

Impacted Files	Coverage Δ
jpype/_jarray.py	`100.00% <ø> (ø)`
jpype/nio.py	`100.00% <ø> (ø)`
native/common/jp_convert.cpp	`38.12% <33.57%> (-35.49%)`	:arrow_down:
native/common/jp_doubletype.cpp	`95.83% <100.00%> (ø)`
native/common/jp_floattype.cpp	`95.62% <100.00%> (ø)`
native/common/jp_inttype.cpp	`96.34% <100.00%> (ø)`
native/common/jp_longtype.cpp	`95.06% <100.00%> (ø)`
native/common/jp_method.cpp	`98.84% <0.00%> (+1.15%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0bacb9e...47bc0cf. Read the comment docs.

Mar 18 '22 09:03 codecov[bot]

Looks like I have introduced a bug on Windows:

test\jpypetest\test_buffer.py:273: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <test_buffer.BufferTestCase testMethod=testLongToNP3D>
jtype = <java class 'JLong'>
limits = [-9223372036854775808, 9223372036854775807], size = (11, 10, 9)
dtype = <class 'numpy.int64'>, code = 'l'

    def executeIntTest(self, jtype, limits, size, dtype, code):
        data = np.random.randint(limits[0], limits[1], size=size, dtype=dtype)
        a = JArray(jtype, data.ndim)(data.tolist())
>       u = np.array(a)
E       RuntimeError: Item size 8 for PEP 3118 buffer format string l does not match the dtype l item size 4.

Update: Correction, I have exposed a bug by increasing the testing. The problem, I think is that l is the platform dependent form. As I understand it, this is actually incorrect as Java has platform independent types (?).

Mar 18 '22 10:03 pelson

I have no objections to this, but it does need to be consistent across all architectures. I will merge it once you get it working.

Mar 18 '22 15:03 Thrameos

Any progress on this one? I was hoping to include it in the next release, but it looks like it is stalled.

Apr 07 '22 15:04 Thrameos

I'm basically blocked by the CI failing. I tried to fix it in #1040 but have unleashed a set of previously untested cases where the tests were already failing (precision issues).

I don't think there is a huge amount of work left here once the CI is fixed (which it will need to be for a release anyway, right?)

Apr 07 '22 19:04 pelson

Got it. I will take a look at the CI.

Apr 07 '22 19:04 Thrameos

@pelson It really appears to me that the problem you are trying to address is a bug in numpy not jpype. The specification that jpype is giving is correct, but the one given by numpy is wrong. At least as far as I read the documentation, https://docs.python.org/3/library/struct.html#format-characters

Consider....

print(memoryview(jpype.JLong[:]([1,2])).format)
print(memoryview(np.array([1, 2])).format)

JPype is reporting "q" which according to the docs is a guaranteed size 8. Numpy is reporting "l" which according to the docs is size 4. Of course when you call itemsize you get 8 because numpy has incorrectly passing long64 as the "l" type.

So numpy is the one using platform dependent encoding on their memory buffers, not jpype. We could do like numpy is doing and report "l" size 8 when we are preforming a memoryview of Java long, but that would certainly not be correct behavior as per the docs.

Apr 12 '22 00:04 Thrameos

The docs state:

The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of '<', '>', '!' or '='

When using native size, the size of the packed value is platform-dependent.

Therefore, the string ``q``` is platform-dependent, by my interpretation.

JPype is reporting "q" which according to the docs is a guaranteed size 8.

I therefore disagree with this statement. It is q proceeded by one of ['<', '>', '!', '='] which is guaranteed 8-bytes.

From my limited understanding of the JNI headers, I believe that longs are not platform dependent in size - they are fixed at 8-byte (from my understanding they are host-endian, though I found conflicting (non-official) sources which suggested they were always big-endian (e.g. https://stackoverflow.com/a/362430/741316)).

It is for this reason that I chose to make the buffers "standard-size" rather than "native-size" (as per the terms in those docs you referenced)

Apr 13 '22 07:04 pelson

OK, this is good to go as far as I'm concerned. Would be happy to have another review please @Thrameos.

Apr 13 '22 07:04 pelson

Your reading of the docs is backwards on two fronts. In C long is platform dependent where as long long is gauranteed as at least 8 by standard. Longs are variable. If you use long you must use itemsize. As least that is how a C++ coder who has worked in the field sonce 1990 reads it. So I have worked on 16, 32, and 64 bit compilers. Long is generally considered 4 in all but 64 bit machines. If we choose to make it report long then shouldnt the patch be 1 line because it is in the reported type not the converter code itself. We currently dont test on a 32 bit maxhine so there is no way to verify this PR.

Further byte orders in JVM is defined as network order. BUT JNI is defined as native order. You don't see endian converters anywhere when integer types are passed. If you are operation from the assumption that JNI is anything but native order then this PR is wrong.

@marsher your opinion?

Apr 13 '22 14:04 Thrameos

If you are operation from the assumption that JNI is anything but native order then this PR is wrong.

I'm not. All of the type definitions use =, which means "host endian". This is consistent with what you say about the JNI.

In C long is platform dependent where as long long is gauranteed as at least 8 by standard. Longs are variable.

The documentation clearly states at https://docs.oracle.com/javase/6/docs/technotes/guides/jni/spec/types.html#wp428 that a Long in Java is a signed 64bit integer. There is actually no reference to the C long definition in those docs.

The logic for a JLong, which I've mapped to =q in this MR, goes:

= because JNI gives us the data in host-endian form
- This triggers "standard size" behaviour
q because in the docs at https://docs.python.org/3/library/struct.html#format-characters, the standard size of q is 8. The size is not platform-dependent.

Your implementation in #1048 looks wrong to me. You are proposing that JLong gets mapped to l instead of q, but this is platform-dependent, as you say. It is not guaranteed to be 64 bit, which is the length that I believe a JLong is always (based on the docs).

I totally respect your years of C++ dev, and can completely believe that I am being naive in believing exactly what the docs say (this may be a hard-earned lesson with the JNI), but I don't see that choosing a platform-dependent C type is correct for the type that the JNI document describes.

Apr 13 '22 19:04 pelson

Closing this out, since there isn't really much value following the improvements in #1048. There were a few typos fixed etc., which I may try to get in later on.

FWIW, I'm not fully convinced of the datatypes that were chosen in #1048, but we can come back to that discussion if needs-be later on (mostly related to endianness, which doesn't concern me too much). For now, thanks for getting the original issue fixed :+1:

May 09 '23 07:05 pelson

jpype jpype copied to clipboard

Correct the buffer protocol format type for longs

Codecov Report

jpype
jpype copied to clipboard