phylanx icon indicating copy to clipboard operation
phylanx copied to clipboard

segfault in kmeans.phylanx.py

Open ct-clmsn opened this issue 7 years ago • 11 comments

getting a segmentation fault on line 36 in kmeans.phylanx.py - any suggestions on how to provide better debugging information?

ct-clmsn avatar Jun 24 '18 18:06 ct-clmsn

@parsa would you mind having a look?

hkaiser avatar Jun 24 '18 19:06 hkaiser

It just works for me. @ct-clmsn do you have any idea how to reproduce it?

parsa avatar Jun 24 '18 22:06 parsa

@parsa will give another shot this evening.

ct-clmsn avatar Jun 24 '18 22:06 ct-clmsn

@parsa, ok, I started to narrow in on the issue. When empty debugging print statements are placed after every line in closest_centroid, the algorithm runs to completion without a segmentation fault.

When the alternating print statements are removed, the system segmentation faults. Maybe this is an edge case for the GIL/stdout issue?

The algorithm after the phylanx call returns results from phylanx directly into a python-side print statement (line:97); would that be a GIL/stdout edge case?

ct-clmsn avatar Jun 25 '18 14:06 ct-clmsn

@ct-clmsn I'd really like to be able to see this error. What do I do to reproduce it?

I don't know what you mean by the edge case issue for line 97. That kmeans function call returns a centroids×2 NumPy array, which is then printed.

parsa avatar Jun 25 '18 15:06 parsa

@parsa, to reproduce the error, I just run the script, and it segfaults (if run without the print statements). If it's working for you, I'm not sure how to reproduce the problem except to say, try running it on a couple of machines (I suspect you already have a couple of machines in the testing system)? If you have advice on generating a stack trace or other debugging output, please relay those tips. I can try to produce some output that might help us sort this out.

For line 97, the edge case might be that the result (the NumPy array returned from centroids) is returned from the Phylanx annotated function into a Python print statement. Maybe there's an issue getting the data from the Phylanx backend into the Python front-end for this particular use case?

The placement of print statements that "fixed" the issue on my systems gives me the suspicion that a possible GIL/stdout bug is happening during the return from that call into the Python front-end's print statement.

ct-clmsn avatar Jun 25 '18 16:06 ct-clmsn

@ct-clmsn, try setting ulimit -c unlimited before executing, then when it crashes, you should get a core file you can load and examine with GDB.

khuck avatar Jun 25 '18 17:06 khuck

@khuck thanks! trying to give that a shot.

ct-clmsn avatar Jun 26 '18 18:06 ct-clmsn

@ct-clmsn I think @parsa is now able to reproduce this issue. We're working on a fix as we speak...

hkaiser avatar Jun 26 '18 21:06 hkaiser

@parsa @hkaiser my apologies for the lack of a reasonably clear bug report!

ct-clmsn avatar Jun 26 '18 22:06 ct-clmsn

Does this issue still exist?

parsa avatar Dec 10 '18 17:12 parsa