StackOverflowError in Structure serialization
Hi guys, it might be an issue with the code on my end, but although my BasePairParameters class works to serialize my data, it still doesn't correctly serialize the Structure object. So if I serialize and save, then reload it into an empty BasePairParameters object, it will correctly pull out all the old data and prints 17 step parameters for a PDB structure (1P71, in my TestBasePairParameters class). However, if I call my analyze() method, which performs all the work all over again on the Structure object, I get out "no data".
So it seems that it has lost something in the translation and if someone could just check this out for me independently on another Structure, it would help. I'm pretty sure I synched to the latest changes because my PR now passes all the tests. I will try to see what's going on but I don't want to mess with the code too much because there were so many things I did to touch it up.
Ok, actually I was wrong, it still throws an error (I forgot to recompile! haha). So Serialization with the Structure class doesn't work. The error is very similar to what I got when I tried to go in myself and mark all the related Structure classes as serializable. I am certain this is real though, because I re-cloned the repository from your main branch, put my code folders back into it, and compiled it over again.
java.lang.StackOverflowError
at java.io.ObjectStreamClass$FieldReflector.getPrimFieldValues(ObjectStreamClass.java:2002)
at java.io.ObjectStreamClass.getPrimFieldValues(ObjectStreamClass.java:1277)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:441)
at java.util.ArrayList.writeObject(ArrayList.java:755)
at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
.... it keeps going for about 50 times as many lines
here's the code so it can be reproduced (NOT using my code at all, but just serializing a Structure):
https://gist.github.com/lukeczapla/7b87e70947e8ed5732f04bd07d113629
I already implemented a test for the Structure serialization and it is working. You might want to take a look at it (https://github.com/biojava/biojava/blob/master/biojava-structure/src/test/java/org/biojava/nbio/structure/TestStructureSerialization.java).
Note that your Test Class is not correct, since the order of the two tests is not enforced and it can be that the deserialization is tested before the serialization, so you have to put both parts in the same test. However, I think this is not the cause of the problem you get.
You are getting a StackOverflowError exception, which I am not sure if it is related to serialization, and I suspect it has to do with the file operations you do. I remember that serialization exceptions are easy to spot. You might want to try the same test with a class you know it is really serializable. Otherwise take a look at my test and add anything you think it is missing.
Related to #673, #676 and #693
Oh yes, my original test was with the class I knew was serializable (BasePairParameters class with the Structure object marked transient) with the same methodology, and it worked. I understand the order is not enforced but this exact same test works on several known serializable objects.
I'll try to use your tests later on in the day. But the StackOverflowException is specific to this class and not to other ones I've serialized with the same methodology, such as a Deeplearning4j trained model, a DNA simulation I wrote using Nd4j, BasePairParameters, etc.
[I marked second test with @After to enforce order in gist]
Ok @lukeczapla, you are right! It seems that the StackOverflowError occurs when the Structure is parsed from an MMCIF or MMTF file format, but not from a PDB file format (which is the format of my test). I will work out a solution and let you know.
Thanks so much, I appreciate looking into it for me. I had used StructureIO.getStructure() and it seems to work with the RCSB and choose mmCIF by default. I personally prefer PDB but it seems to be moving to mmCIF due to size limitations of PDB format [and I've managed to build systems so big where I had to switch to digits with only 2 digits after the decimal with %8.2f to trick the PDB format]
Yes you should try to use mmCif where possible, PDB format is now legacy and should be avoided.
The default in BioJava is MMTF, and apparently that is the one that has problems when serialized. PDB and mmCif work fine. Will continue looking into it, thanks for reporting!
For the moment, to avoid the problem you can set the parsing file format as mmCif (not MMTF) and you should be able to serialize the Structure objects without problem. Instead of:
Structure s = StructureIO.getStructure("pdbid")
Try the following:
AtomCache cache = new AtomCache();
cache.setUseMmCif(true);
Structure s = cache.getStructure("pdbid");
I have been looking into what might be causing the StackOverflowError when calling the writeObject() method and it seems that objects referring to one another (recursively) is the most probable case.
This means that some pointers are corrupted (or misplaced) during the MMTF parsing, since this does not occur for Structures that come from PDB or MMCIF file formats. A way to debug this would be to plot the dependency graph of the objects in the Structure. Maybe we need a new test for that, since this serialization issue has opened another possible problem in BioJava Structures difficult to detect.
These threads are useful:
- https://stackoverflow.com/questions/438875/stackoverflowerror-when-serializing-an-object-in-java
- https://coderanch.com/t/277328/java/java-lang-StackOverflowError-Serialization
@pwrose do you have any idea or hint to the origin of the problem in the MMTF parser?
I'm not sure why this problem only shows up with mmtf files. As you say, objects referring to each other is a typical problem.
I believe mmtf creates bonds by default. mmCIF may not create bonds by default, so I think this is one area to look at.
On Mon, Aug 7, 2017 at 10:06 AM, Aleix Lafita [email protected] wrote:
I have been looking into what might be causing the StackOverflowError when calling the writeObject() method and it seems that objects referring to one another (recursively) is the most probable case.
This means that some pointers are corrupted (or misplaced) during the MMTF parsing, since this does not occur for Structures that come from PDB or MMCIF file formats. A way to debug this would be to plot the dependency graph of the objects in the Structure. Maybe we need a new test for that, since this serialization issue has opened another possible problem in BioJava Structures difficult to detect.
These threads are useful:
- https://stackoverflow.com/questions/438875/stackoverflowerror-when- serializing-an-object-in-java
- https://coderanch.com/t/277328/java/java-lang-StackOverflowError- Serialization
@pwrose https://github.com/pwrose do you have any idea or hint to the origin of the problem in the MMTF parser?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biojava/biojava/issues/697#issuecomment-320721839, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwEP5gn5n6k9d8XQVgIKPnR-sMbHsRks5sV0QagaJpZM4Ou0bh .
@pwrose you are right, the bonds are creating the StackOverflowError. If I set the bond parsing as true for the MMCIF or PDB formats the error appears, as for MMTF.
I think we need to re-implement the writeObject() and readObject() methods if we want to allow Structure serialization. Another option could be to set the bond information as transient, but I am not sure the side-effects this can have.
Since the MMTF format is actually an efficient serialization of a Structure, I was thinking that we could replace the read and write object methods of the Structure class with a coding and decoding to MMTF representations, respectively.
I think it should work fine and it would be a nice application for the format as well. Do you think this would be possible @pwrose?
I think we need to look at the readObject/writeObject methods. The adding of bonds uses a strange design pattern where bonds add themselves to the structure. Perhaps we need to look at a different way to add the bonds.
On Tue, Aug 8, 2017 at 2:28 AM, Aleix Lafita [email protected] wrote:
@pwrose https://github.com/pwrose you are right, the bonds are creating the StackOverflowError. If I set the bond parsing as true for the MMCIF or PDB formats the error appears, as for MMTF.
I think we need to re-implement the writeObject() and readObject() methods if we want to allow Structure serialization. Another option could be to set the bond information as transient, but I am not sure the side-effects this can have.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biojava/biojava/issues/697#issuecomment-320902601, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwEDdMGALed0x02EeVpxc9k1c-T3uRks5sWCoqgaJpZM4Ou0bh .