jpype
jpype copied to clipboard
Correct the buffer protocol format type for longs
The docs on this are https://docs.python.org/3/c-api/buffer.html#c.Py_buffer.format and https://docs.python.org/3/library/struct.html#format-characters.
Current behaviour claims that the format is "long long", which as I understand on some platforms differs to int64 (I didn't verify this). Even so, before this change:
>>> np.array(jp.JArray(jp.JLong)([1, 2])).dtype.type
<class 'numpy.longlong'>
Whereas numpy does:
>>> np.array([1, 2], dtype=np.long).dtype.type
<class 'numpy.int64'>
Codecov Report
Merging #1039 (1dec798) into master (0bacb9e) will decrease coverage by
1.25%
. The diff coverage is35.68%
.
:exclamation: Current head 1dec798 differs from pull request most recent head 47bc0cf. Consider uploading reports for the commit 47bc0cf to get more accurate results
@@ Coverage Diff @@
## master #1039 +/- ##
==========================================
- Coverage 89.05% 87.80% -1.26%
==========================================
Files 112 112
Lines 10088 10242 +154
Branches 3992 4022 +30
==========================================
+ Hits 8984 8993 +9
- Misses 663 801 +138
- Partials 441 448 +7
Impacted Files | Coverage Δ | |
---|---|---|
jpype/_jarray.py | 100.00% <ø> (ø) |
|
jpype/nio.py | 100.00% <ø> (ø) |
|
native/common/jp_convert.cpp | 38.12% <33.57%> (-35.49%) |
:arrow_down: |
native/common/jp_doubletype.cpp | 95.83% <100.00%> (ø) |
|
native/common/jp_floattype.cpp | 95.62% <100.00%> (ø) |
|
native/common/jp_inttype.cpp | 96.34% <100.00%> (ø) |
|
native/common/jp_longtype.cpp | 95.06% <100.00%> (ø) |
|
native/common/jp_method.cpp | 98.84% <0.00%> (+1.15%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 0bacb9e...47bc0cf. Read the comment docs.
Looks like I have introduced a bug on Windows:
test\jpypetest\test_buffer.py:273:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <test_buffer.BufferTestCase testMethod=testLongToNP3D>
jtype = <java class 'JLong'>
limits = [-9223372036854775808, 9223372036854775807], size = (11, 10, 9)
dtype = <class 'numpy.int64'>, code = 'l'
def executeIntTest(self, jtype, limits, size, dtype, code):
data = np.random.randint(limits[0], limits[1], size=size, dtype=dtype)
a = JArray(jtype, data.ndim)(data.tolist())
> u = np.array(a)
E RuntimeError: Item size 8 for PEP 3118 buffer format string l does not match the dtype l item size 4.
Update: Correction, I have exposed a bug by increasing the testing. The problem, I think is that l
is the platform dependent form. As I understand it, this is actually incorrect as Java has platform independent types (?).
I have no objections to this, but it does need to be consistent across all architectures. I will merge it once you get it working.
Any progress on this one? I was hoping to include it in the next release, but it looks like it is stalled.
I'm basically blocked by the CI failing. I tried to fix it in #1040 but have unleashed a set of previously untested cases where the tests were already failing (precision issues).
I don't think there is a huge amount of work left here once the CI is fixed (which it will need to be for a release anyway, right?)
Got it. I will take a look at the CI.
@pelson It really appears to me that the problem you are trying to address is a bug in numpy not jpype. The specification that jpype is giving is correct, but the one given by numpy is wrong. At least as far as I read the documentation, https://docs.python.org/3/library/struct.html#format-characters
Consider....
print(memoryview(jpype.JLong[:]([1,2])).format)
print(memoryview(np.array([1, 2])).format)
JPype is reporting "q" which according to the docs is a guaranteed size 8. Numpy is reporting "l" which according to the docs is size 4. Of course when you call itemsize you get 8 because numpy has incorrectly passing long64 as the "l" type.
So numpy is the one using platform dependent encoding on their memory buffers, not jpype. We could do like numpy is doing and report "l" size 8 when we are preforming a memoryview of Java long, but that would certainly not be correct behavior as per the docs.
The docs state:
The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of '<', '>', '!' or '='
When using native size, the size of the packed value is platform-dependent.
Therefore, the string ``q``` is platform-dependent, by my interpretation.
JPype is reporting "q" which according to the docs is a guaranteed size 8.
I therefore disagree with this statement. It is q
proceeded by one of ['<', '>', '!', '=']
which is guaranteed 8-bytes.
From my limited understanding of the JNI headers, I believe that longs are not platform dependent in size - they are fixed at 8-byte (from my understanding they are host-endian, though I found conflicting (non-official) sources which suggested they were always big-endian (e.g. https://stackoverflow.com/a/362430/741316)).
It is for this reason that I chose to make the buffers "standard-size" rather than "native-size" (as per the terms in those docs you referenced)
OK, this is good to go as far as I'm concerned. Would be happy to have another review please @Thrameos.
Your reading of the docs is backwards on two fronts. In C long is platform dependent where as long long is gauranteed as at least 8 by standard. Longs are variable. If you use long you must use itemsize. As least that is how a C++ coder who has worked in the field sonce 1990 reads it. So I have worked on 16, 32, and 64 bit compilers. Long is generally considered 4 in all but 64 bit machines. If we choose to make it report long then shouldnt the patch be 1 line because it is in the reported type not the converter code itself. We currently dont test on a 32 bit maxhine so there is no way to verify this PR.
Further byte orders in JVM is defined as network order. BUT JNI is defined as native order. You don't see endian converters anywhere when integer types are passed. If you are operation from the assumption that JNI is anything but native order then this PR is wrong.
@marsher your opinion?
If you are operation from the assumption that JNI is anything but native order then this PR is wrong.
I'm not. All of the type definitions use =
, which means "host endian". This is consistent with what you say about the JNI.
In C long is platform dependent where as long long is gauranteed as at least 8 by standard. Longs are variable.
The documentation clearly states at https://docs.oracle.com/javase/6/docs/technotes/guides/jni/spec/types.html#wp428 that a Long in Java is a signed 64bit integer. There is actually no reference to the C long definition in those docs.
The logic for a JLong
, which I've mapped to =q
in this MR, goes:
-
=
because JNI gives us the data in host-endian form- This triggers "standard size" behaviour
-
q
because in the docs at https://docs.python.org/3/library/struct.html#format-characters, the standard size ofq
is 8. The size is not platform-dependent.
Your implementation in #1048 looks wrong to me. You are proposing that JLong
gets mapped to l
instead of q
, but this is platform-dependent, as you say. It is not guaranteed to be 64 bit, which is the length that I believe a JLong is always (based on the docs).
I totally respect your years of C++ dev, and can completely believe that I am being naive in believing exactly what the docs say (this may be a hard-earned lesson with the JNI), but I don't see that choosing a platform-dependent C type is correct for the type that the JNI document describes.
Closing this out, since there isn't really much value following the improvements in #1048. There were a few typos fixed etc., which I may try to get in later on.
FWIW, I'm not fully convinced of the datatypes that were chosen in #1048, but we can come back to that discussion if needs-be later on (mostly related to endianness, which doesn't concern me too much). For now, thanks for getting the original issue fixed :+1: