PyFile support
I have explored supporting matplotlib font loading, which require PyFile support.
I have commented in
FILE *
PyFile_AsFile(PyObject *f)
{
printf("Accessing file.\n");
if (f == NULL || !PyFile_Check(f))
return NULL;
else
return ((PyFileObject *)f)->f_fp;
}
Which is called and immediately SIG_ABORTS. I get a dump file from the JVM:
Register to memory mapping:
RAX=0x00007fb7cbdd8280: PyFile_Type+0 in /home/christian/Development/JyNI/build/libJyNI.so at 0x00007fb7cbaa9000
RBX=0x00007fb7f8070958 is an unknown value
RCX=0x00007fb7c055d390 is an unknown value
RDX=0x0000000000000001 is an unknown value
RSP=0x00007fb8278190e8 is pointing into the stack for thread: 0x00007fb82000a000
RBP=0x00007fb7f8032e30 is an unknown value
RSI=0x00007fb7c055d390 is an unknown value
RDI=0x00007fb7f8032e30 is an unknown value
R8 =0x0000000000000000 is an unknown value
R9 =0x0000000000000004 is an unknown value
R10=0x0000000000000319 is an unknown value
R11=0x00007fb7cbb08b65: PyErr_SetString+0 in /home/christian/Development/JyNI/build/libJyNI.so at 0x00007fb7cbaa9000
R12=0x0000000000000000 is an unknown value
R13=0x0000000000000000 is an unknown value
R14=0x00007fb827819130 is pointing into the stack for thread: 0x00007fb82000a000
R15=0x0000000000000001 is an unknown value
Stack: [0x00007fb82771f000,0x00007fb827820000], sp=0x00007fb8278190e8, free space=1000k
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 3935 JyNI.JyNI.callPyCPeer(JLorg/python/core/PyObject;Lorg/python/core/PyObject;J)Lorg/python/core/PyObject; (0 bytes) @ 0x00007fb81183612c [0x00007fb8118360c0+0x6c]
j JyNI.PyCPeerType.__call__([Lorg/python/core/PyObject;[Ljava/lang/String;)Lorg/python/core/PyObject;+36
J 2911 C1 org.python.core.PyObject.__call__(Lorg/python/core/PyObject;)Lorg/python/core/PyObject; (16 bytes) @ 0x00007fb811ae0fec [0x00007fb811ae0e80+0x16c]
J 5454 C2 org.python.core.PyObject.__call__(Lorg/python/core/ThreadState;Lorg/python/core/PyObject;)Lorg/python/core/PyObject; (6 bytes) @ 0x00007fb8125b22e0 [0x00007fb8125b22a0+0x40]
j matplotlib.font_manager$py.createFontList$16(Lorg/python/core/PyFrame;Lorg/python/core/ThreadState;)Lorg/python/core/PyObject;+673
j matplotlib.font_manager$py.call_function(ILorg/python/core/PyFrame;Lorg/python/core/ThreadState;)Lorg/python/core/PyObject;+368
Just returning NULL is not sufficient as a quick hack and I guess that I need to wrap a Jython class for PyFile instead of accessing the pointer directly. You said it would be easy to fix, but in the comment you say:
//Providing the following method via JyNI would be very hard or impossible.
//However, we could create a FILE-Pointer to the same file and also adjust
//io mode and seek-position to the values of the given PyFile.
//But this may lead to conflicts in systems not supporting multiple file
//access (i.e. windows) and would not stand equality-checks via pointer-wise "==".
I had some frustrations with filesystem APIs lately for my kv-store implementation, so I would also be careful with double access. What are your suggestions to proceed? Also do you have some chat channel? This might be easier to get started.
First of all, note that PyFile_WriteString is already implemented and can serve as a pattern.
I remember I once decided it would be best to keep PyFile-stuff on Jython-side and call into that stuff on C-side. Such decisions are mostly made by carefully reviewing the header file. Especially if there are no macros accessing an object's internals (other than type, length, size and refcounter), the best way is to implement a Jython-centric solution like shown in PyFile_WriteString.
That said, look at https://github.com/Stewori/JyNI/blob/master/JyNI-C/include/JyNI_JNI.h#L508 and https://github.com/Stewori/JyNI/blob/master/JyNI-C/src/JyNI_JNI.c#L449 and https://github.com/Stewori/JyNI/blob/master/JyNI-C/src/JyNI_JNI.c#L1100. At these three spots you must declare functions in Jython for C-access. https://github.com/Stewori/JyNI/blob/master/JyNI-C/include/JNI_util.h declares a bunch of magic and macros to simplify JNI-based declarations. Originally it was even more painful. See the macros https://github.com/Stewori/JyNI/blob/master/JyNI-C/include/JNI_util.h#L133 and below. These are meant for public access:
JNI_METH_CLASS(classID, name, ret, ...)
JNI_METH_INTERFACE(classID, name, ret, ...)
JNI_METH_CLASS2(classID, name, cName, ret, ...)
JNI_METH_STATIC(classID, name, ret, ...)
JNI_METH_STATIC2(classID, name, cName, ret, ...)
JNI_FIELD(classID, name, tp)
JNI_FIELD_STATIC(classID, name, tp)
JNI_CONSTRUCTOR(classID, cName, ...)
JNI_SINGLETON(classID, name, tp)
JNI_SINGLETON2(classID, name, cName, tp)
Yes, I have to document these (please help with that if you can!) Look at https://github.com/Stewori/JyNI/blob/master/JyNI-C/src/JyNI_JNI.c for plenty usage examples. Feel free to ask further questions if you struggle with using these macros.
Note that JNI_METH(classID, name, ret) and JNI_METH_1(classID, name, ret, arg1, tp1) at the very bottom are still unfinished/experimental and not used by JyNI. The plan is that they would also add convenient calling-code rather than just the declaration.
In http://www.kfu.com/%7Ensayer/Java/jni-filedesc.html we can see that java.io.FileDescriptor has a private int-field fd. I hope we can access that to implement PyFile_AsFile. A Jython PyFile uses org.python.core.RawIO as backend. Most likely this would be a FileIO, which contains a RandomAccessFile, which in turn contains a java.io.FileDescriptor. This will involve some struggling with private API and is thus best done directly from JNI (we might have to revisit the solution for Java 9). It might be a bit tricky to get this right; good luck!
Functions that don't access internals of the PyFileObject struct, e.g. PyObject_AsFileDescriptor can be commented in without a change. If they call other functions, we should focus on implementing these.
Ok, I have followed your advice. I have first exported the filedescriptor, which was already accessible in the Jython API (through the usual reflection hack in FileIO.__int__():
https://github.com/whilo/jython/commit/68ed56a90c945984d6005e755e6e0446ca57cc01
This was much easier than unwrapping the nested objects through the C API for me, but this is obviously only a hack to see how far I can get. I then added the necessary method declarations as you suggested:
https://github.com/whilo/JyNI/commit/c51c27754f80c0b807e1640b6c1c3a6b0f168383
I can import matplotlib.pyplot now (pylab still crashes):
https://github.com/whilo/JyNI/commit/c51c27754f80c0b807e1640b6c1c3a6b0f168383#diff-753c9fd4810cb43fc681a2eae65157cc
I can create a plot object, but after the tkinter window pops up the process immediately crashes.
christian@lacan ~/D/JyNI> java -cp jython.jar:build/JyNI.jar org.python.util.jython
Jython 2.7.1rc3 (, Jun 19 2017, 12:45:06)
[OpenJDK 64-Bit Server VM (Oracle Corporation)] on java1.8.0_131
Type "help", "copyright", "credits" or "license" for more information.
>>> import setup_path
>>> import matplotlib.pyplot as plt
>>> plt.plot([1,2,3])
[<matplotlib.lines.Line2D object at 0x2a3>]
>>> plt.show()
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f93d1a0bdc8, pid=16675, tid=0x00007f93fa375700
#
# JRE version: OpenJDK Runtime Environment (8.0_131-b11) (build 1.8.0_131-8u131-b11-0ubuntu1.17.04.1-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [multiarray.x86_64-linux-gnu.so+0x5adc8]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/christian/Development/JyNI/hs_err_pid16675.log
Compiled method (c1) 48634 4793 1 JyNI.gc.DefaultTraversableGCHead::setLinks (6 bytes)
total in heap [0x00007f93e17affd0,0x00007f93e17b0288] = 696
relocation [0x00007f93e17b00f8,0x00007f93e17b0120] = 40
main code [0x00007f93e17b0120,0x00007f93e17b01c0] = 160
stub code [0x00007f93e17b01c0,0x00007f93e17b0250] = 144
oops [0x00007f93e17b0250,0x00007f93e17b0258] = 8
scopes data [0x00007f93e17b0258,0x00007f93e17b0260] = 8
scopes pcs [0x00007f93e17b0260,0x00007f93e17b0280] = 32
dependencies [0x00007f93e17b0280,0x00007f93e17b0288] = 8
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
fish: 'java -cp jython.jar:build/JyNI.…' terminated by signal SIGABRT (Abbruch)
I can read the error log, but I guess I should study the dumps? It points to multiarray, so I guess it is not related to further PyFile methods. I would tackle them, but first I would like to get matplotlib basically running. What are your suggestions?
Is the issue reproducible? Can you isolate the Python code in matplotlib that provokes the crash? First thing to do is to create a minimal Python code sample (ideally independent from matplotlib) that reproduces the crash. Then post this sample as a new issue, so this thread can focus on PyFile support.
Issue is here now: https://github.com/Stewori/JyNI/issues/12
Whilo, could you build a PR from what you achieved so far? That would make it easier for others (including myself) to reproduce the subsequent issue you describe.
Please adjust your approach such that it works without modifications in Jython. IMO this would be not too complicated (one additional native field access). On the other hand, having in mind that Jython is currently in late RC-phase for 2.7.1 release, such a change would only make it into Jython 2.7.2 and I wouldn't make a guess when this release will happen (hopefully the huge delay of 2.7.1 was an exception, but let's better not rely on that).
Sure, I can do that. The current approach is a hack to get something working and start from there. I don't think I need a patched Jython version.
Hey whilo. Just posting a polite reminder to file a PR for your work on PyFile. It would be a pity to loose this, even if it's just a starting point so far...
Thanks for the reminder, it is high on my todo list. :)
So, my work on porting JyNI to Windows is almost done. This means, it is a good time for a next release. Do you think we can get some PyFile support into JyNI alpha 5?
Thanks for insisting. I will sit down and work on it this evening. Will you be online on IRC?
I can go online on IRC later. Note that it's not due exactly today. I'd like to make the release somewhat end of the week or next weekend. Also, it's not totally crucial to get PyFile into this release, but it would be nice.
So, I am observing IRC log frequently, but I cannot guarantee that I can be online at a particular time. Ithink it would be best to resolve questions here in an async manner.
Ok, I have been there late in the evening, but I guess it was too late :). I have tried to do the JVM PyFile API calls in the C context, but somehow printing (printf) does not work anymore. I am not sure why.
printf prints async regarding JVM console output. In many consoles printf and puts output will only show up on JVM shutdown or at random-like time points when C output is flushed. If JyNI.h is included, use jputs for text or jputsLong for integers; their output will be in sync with JVM output. I didn't yet write an actual jprintf, feel free to add one if you need it.
Do you need printf for anything else than debugging?
Right, the C routine PyFile_AsFile(PyObject *f) is not called anymore when importing matplotlib it seems. Printing works. Is there an easy way to test it without depending on matplotlib? I am not sure what other libraries need native pyfile access. I can have a look later.
Maybe there is some test code in CPython. Otherwise we should maybe write our own test(s). DemoExtension is intended to serve as a collection for native test code.
Ok, deleting the fontcache of matplotlib did the trick for now. But now I need to cast and access the FileIO object and I have no clue how to do that with JNI.
I will try to figure it out, but having a general chat medium with offline capability, code formatting and mail notifications would really be helpful for this, especially to have some period of time where I can just ask you dumb questions. From the open-source projects I run and support I can really recommend setting up gitter. It is open-source, is integrated with github and takes maybe five minutes to setup.
Maybe you can add PyFile.fd as a static method to JyNI.java, e.g. like static int JyNI_PyFile_fd(PyFile file). Then it would be a single easy JNI call you can do the same way like you called PyFile.fd.
But then we'd keep changes to JyNI and needn't modify Jython.
FILE *
PyFile_AsFile(PyObject *f)
{
printf("Accessing file.\n");
env(-1);
if (f == NULL)
puts("PyFile_AsFile with NULL-pointer");
jobject f2 = JyNI_JythonPyObject_FromPyObject(f);
//(*env)->CallVoidMethod(env, ((JyObject*) f)->jy, pyFileWrite, (*env)->NewStringUTF(env, s));
jobject fileno = (*env)->CallObjectMethod(env, f2, pyFile_fileno);
printf("fileno: %i \n", fileno);
jobject fd_pyint = (*env)->CallObjectMethod(env, fileno, FileIO___int__);
jint fd_int = (*env)->CallObjectMethod(env, fd_pyint, pyInt_getValue);
printf("fd_int: %i \n", fd_int); // prints 0 ?!?
jint fd = (*env)->CallObjectMethod(env, f2, pyFile_fd);
printf("FD: %i \n", fd);
// TODO get mode from PyFile
return fdopen(fd, "r");
//todo: JNI Exception handling
}
The fd_int printf line prints 0 and not the correct fd value. I guess that casting happens implicitly and if I try to call a non-existing method I get an exception from the JVM.
So I ignored the casting issue, but the result makes no sense so far to me. I guess PyInteger is 0 by default due to JVM Integer being 0 by default, but I am at least getting a proper instance of it, which is weird. If I had done something wrong on the way to access it, it should have thrown or caused a segmentation fault.
I have implemented the same path to fd as in PyFile.fd() now, so no idea why it fails.
If my current approach in C is actually worse in your opinion than adding a static method to JyNI (which is also somewhat odd by providing an external method for PyFile), then I will take that way. I thought it would be right to do it from C, without implementing custom helpers.
jint fd_int = (*env)->CallObjectMethod(env, fd_pyint, pyInt_getValue);
should rather be
jint fd_int = (*env)->CallIntMethod(env, fd_pyint, pyInt_getValue);
That should fix it.
(in jint fd = (*env)->CallObjectMethod(env, f2, pyFile_fd); respectively)
If my current approach in C is actually worse in your opinion than adding a static method to JyNI (which is also somewhat odd by providing an external method for PyFile)
Putting a sequence of Java calls to Java side is faster AFAIK, because it reduces calls from C into JVM which are JNI's slowest facet. That said, there is plenty of such stuff in JyNI. I guess I became a little painless about this. We can maybe add a proper method to Jython, but until that finds its way into a release, an external implementation in JyNI is a good migration path.
I've started working on finishing implementing the PyFile API. It's on a separate branch/fork and I hope to have decent tests for the API before merging it with master. But if a working PyFile API is needed for anything then I can submit a pull request earlier.
Question:
I am unsure as to what to do about the memory management functions in PyFile, I feel like there will be some details about JyNI GC that will come into play here. The functions I am most unsure about are:
PyFile_DecUseCount, PyFile_IncUseCount, tp_dealloc and tp_weaklistoffset.
There are generic object functions for tp_alloc and tp_free, I haven't tested them yet so they may not work as they may also have quirks.
General Progress Report:
My current progress is on my Implementing-PyFile branch. Before submitting a pull request I will fix the git history and clean things up(I'm not used to memory management, variables aren't always declared at the start of blocks, etc). I just thought it was worth having the progress somewhere other than just my local eclipse workspace. Most of PyFile is done and passes tests (33/46).
Other than the memory management things in the question, the remaining functions that don't have tests that they pass are: tp_iternext, tp_iter, tp_init and PyFile_WriteObject - I'm finding these tricky but I hope to finish them on Monday file_exit, file_enter and file_truncate - It wasn't instantly obvious what these do and I haven't looked into them yet, hopefully they are easy enough and I can also do them on Monday.
Tests: I've put the tests in a separate file, I think ideally there would be a C file and python file for each section of the API. I had looked into the CPython test suite, they do have tests for the Capi, but it seemed quite difficult to get them to run as they wouldn't import without all of the Capi being available/declared. I tried just uncommenting a lot of code in the hope of fooling the tests enough to import (of course they would still fail) but this didn't work very well. I also tried to cut out just the tests I wanted from their test module, at which point I realised I couldn't actually find any that tested the PyFile Capi specifically. At this point I gave up on trying to use the CPython tests and just wrote my own.
Re UseCount:
We need to figure out what fobj->unlocked_count actually means and - more importantly - how Jython implements that functionality. It looks like kind of an additional ref counter that counts how many objects/threads/? are currently using the file. Usually I would assume the ref counter could take care of this, but there must have been a reason why they added another counter. Maybe because the file handle int is not necessarily the same thing as the file object. I.e. the handle can be used by more participants than the file object. In that case it would make sense that fobj->unlocked_count is kind of a reference counter for the file handle while the ordinary ref counter manages the file object.
@CalumFreeman That's just a guess from looking at the code. Is there something in CPython docs?
It looks like in Jython the role of the handle is played by a TextIOBase object, no obvious use counter so far. Maybe it's just using Java memory management w.r.t. TextIOBase to provide use count equivalent guarantees. In general that seems not so much an issue in Jython as there is not really public API to get the handle. I'd suggest we handle it in JyNI by maintaining the field fobj->unlocked_count in native PyFile objects and regart the Java PyFile object as constantly holding one use count. So it the counter would be initialized to one if a Java PyFile object is handled to native side. A purely natively created file object gets the usual initialization and if it is passed to Java side use count it incremented. Remains the question how we tidy up. Can we somehow hook into PyFile.close or so? E.g. by injecting a custom PyFile.Closer the wraps the original closer but additionally informs JyNI.
Okay, I took a closer look. In CPython there is this code snippet:
static PyObject *
close_the_file(PyFileObject *f)
{
int sts = 0;
int (*local_close)(FILE *);
FILE *local_fp = f->f_fp;
char *local_setbuf = f->f_setbuf;
if (local_fp != NULL) {
local_close = f->f_close;
if (local_close != NULL && f->unlocked_count > 0) {
if (f->ob_refcnt > 0) {
PyErr_SetString(PyExc_IOError,
"close() called during concurrent "
"operation on the same file object.");
...
Regarding the error case close() called during concurrent operation on the same file object. the counterpart in Jython is not so obvious. I think it relies on the JVM throwing an IOException at FileIO:
public void close() {
if (closed()) {
return;
}
try {
fileChannel.close();
} catch (IOException ioe) {
throw Py.IOError(ioe);
}
super.close();
}
I suggest to write a new subclass of FileIO, e.g. JyNIFileIO. For simplicity it wraps the original FileIO and delgates everything to its backend. From JNI we can inject this into PyFile.file. It would implement close such that a maybe native counterpart is checked for fobj->unlocked_count > 0. I guess the best idea would be to call native close_the_file to let it perform that check.
In this scope I guess it's not necessary - unlike I suggested earlier - to track an fobj->unlocked_count count for the Java PyFile. Simply initialize it to zero, maintain the counter like C code currently does. We only have to incorporate this into the check in FileIO.close.
Regarding the constructor I think we should implement it like other objects fully delegated to Jython. E.g. look at set.
- set
tp_freetoPyObject_Free(as opposed to formerlyPyObject_Del) - The following destructor should be okay:
static void
file_dealloc(PyFileObject *f)
{
JyNIDebugOp(JY_NATIVE_FINALIZE, f, -1);
// We will revisit that:
//if (f->weakreflist != NULL)
// PyObject_ClearWeakRefs((PyObject *) f);
ret = close_the_file(f);
if (!ret) {
PySys_WriteStderr("close failed in file object destructor:\n");
PyErr_Print();
}
else {
Py_DECREF(ret);
}
Py_TYPE(so)->tp_free(so);
}
Did you already implement close_the_file? It should only perform the check for use count and then delegate to Jython's FileIO.close in some way. Make sure there won't be a delegation cycle. See if you can figure that out. Otherwise we can follow up on that part here later.
Hope this helps! I admit that makes PyFile implementation a bit more complicated than I originally expected.