aptasuite icon indicating copy to clipboard operation
aptasuite copied to clipboard

DataCorruption error when loading previously parsed data

Open so-susan opened this issue 2 years ago • 6 comments

Hello, I'm hoping someone can help me figure out what I'm doing wrong here. I appear to be repeatedly getting an issue with loading metadata when attempting to open a set of previously parsed data. The terminal reports a checksum mismatch, though I have not touched the files.

For greater context: I parsed some data via CLI and then when I try to open it via GUI (File>Open Experiment), the load screen hangs indefinitely after loading the last cycle. The log file doesn't indicate anything different than what I'm seeing on that load screen or in the terminal, apart from the last line, which says it's reading the metadata file. Snippet from the log file:

[12:19:15 | INFO | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Processing selection cycle Pool-R7 

[12:19:15 | CONFIG | main]: lib.aptamer.datastructures.MapDBSelectionCycle
Reading from file 'C:\Users\Administrator\Documents\AptamerSeqInfo\NeoVentTAG\cycledata\7_Pool-R7.mapdb' for selection cycle Pool-R7. 

[12:19:19 | CONFIG | main]: lib.aptamer.datastructures.Metadata
Reading metadata file from 'C:\Users\Administrator\Documents\AptamerSeqInfo\NeoVentTAG\metadata.mapdb 

So then I tried running an analysis via CLI (in this case java -jar aptasuite-0.9.8-SNAPSHOT.jar -predict structure -config /path/to/config/file)

This error prints on screen (but is not saved to the log file log_2022-05-12_12-16-43.txt, which looks identical to the one from opening via GUI at this step):

Processing selection cycle Pool-R7
Exception in thread "main" org.mapdb.DBException$DataCorruption: Header checksum broken. Store was not closed correctly and might be corrupted. Use `DBMaker.checksumHeaderBypass()` to recover your data. Use clean shutdown or enable transactions to protect the store in the future.
        at org.mapdb.StoreDirectAbstract.fileHeaderCheck(StoreDirectAbstract.kt:113)
        at org.mapdb.StoreDirect.<init>(StoreDirect.kt:114)
        at org.mapdb.StoreDirect$Companion.make(StoreDirect.kt:57)
        at org.mapdb.StoreDirect$Companion.make$default(StoreDirect.kt:56)
        at org.mapdb.DBMaker$Maker.make(DBMaker.kt:450)
        at lib.aptamer.datastructures.Metadata.loadDataFromFile(Metadata.java:148)
        at lib.aptamer.datastructures.Metadata.<init>(Metadata.java:101)
        at lib.aptamer.datastructures.Experiment.<init>(Experiment.java:252)
        at aptasuite.CLI.exportData(CLI.java:990)
        at aptasuite.CLI.<init>(CLI.java:280)
        at aptasuite.Aptasuite.main(Aptasuite.java:70)

Looking at the log file generated when parsing data via CLI (log_2022-05-10_16-26-58.txt), I see no errors. It looks like it created the metadata as expected:

[4:29:39 | CONFIG | main]: lib.aptamer.datastructures.Metadata
Creating new Metadata instance. 

[4:29:39 | INFO | main]: lib.aptamer.datastructures.Experiment
Loading took 161081 milliseconds 

[4:29:39 | INFO | main]: aptasuite.CLI
Initializing Experiment 

[4:29:39 | INFO | main]: aptasuite.CLI
Experiment Setup

and after parsing it exported the pool data as expected:

[2:23:17 | INFO | main]: aptasuite.CLI
Using existing sequencing data 

[2:23:17 | INFO | main]: aptasuite.CLI
Starting Data Export 

[2:23:17 | INFO | main]: aptasuite.CLI
The export path does not exist on the file system. Creating folder C:\Users\Administrator\Documents\AptamerSeqInfo\NeoVentTAG\export 

[2:23:17 | INFO | main]: aptasuite.CLI
Exporting pool data to file C:\Users\Administrator\Documents\AptamerSeqInfo\NeoVentTAG\export\pool.txt.gz 

[2:23:17 | CONFIG | main]: lib.export.CompressedExportWriter
Created compressed file C:\Users\Administrator\Documents\AptamerSeqInfo\NeoVentTAG\export\pool.txt.gz 

[10:09:41 | CONFIG | main]: lib.export.CompressedExportWriter
Closing file C:\Users\Administrator\Documents\AptamerSeqInfo\NeoVentTAG\export\pool.txt.gz 

[10:09:41 | INFO | main]: aptasuite.Aptasuite
Exiting. 

Further, I have only had one instance of being able to open a previously parsed dataset. I have tried comparing the current config file to that of the set that loads properly to see if I have made an error there, but they are the same apart from some specifics to the datasets. And looking back through the old log files of other smaller experiments that I parsed in GUI and that would not re-open via GUI, it also appears to have stopped at this "reading metadata" step.

Finally, I don't think that it matters, but I have been using Windows Server 2016 with Cygwin, Mac OS Monterey (12.0.1) and macOS Catalina (10.15.7).

I am really stumped as to why this might be happening! Though I assume at this point that it is a user error on my part, given the existence of the one parsed set I have that loads fine for analysis.

Does anyone have suggestions or possible things to try?

Thank you!!

so-susan avatar May 12 '22 20:05 so-susan

Hi,

Thank you for reporting this. This is the first time I am hearing about such an issue. Are you by any chance storing your projects on a network drive?

drivenbyentropy avatar May 12 '22 21:05 drivenbyentropy

No, they are stored locally. (I had read previously that using a network location wouldn't work.)

Thanks for helping me get to the bottom of this! :)

so-susan avatar May 12 '22 22:05 so-susan

Interesting.

Could you please provide me with the java version you are using, as well as the AptaSuite version for which this issues occurs?

drivenbyentropy avatar May 16 '22 17:05 drivenbyentropy

I am currently using Liberica's open JDK with fx (managed through sdkman):

$ sdk current java

Using java version 17.0.2fx-librca

The AptaSuite version is aptasuite-0.9.8-SNAPSHOT (the 0.9.8 version update came out while parsing the dataset; for consistency, I have not updated to the latest version yet).

My previous set up on a laptop with less compute power (which also got stuck while opening an experiment via GUI in the same way) is jdk-17.0.1.jdk with javafx.version=17.0.1 (javafx.runtime.version=17.0.1+1, javafx.runtime.build=1). However, I also had to compile AptaSuite (same version 0.9.8-SNAPSHOT) from source with maven to get fx to work with that set up; which is why I switched to sdkman and found other options when installing on a more powerful computer. (And the liberica jdk did work to open an experiment for one parsed data set.)

so-susan avatar May 17 '22 16:05 so-susan

I parsed the data again (this time using aptasuite-0.9.8 for simplicity), and I finally noticed that the error appears to happen at the end of parsing.

Here's the screen output from that:

106030528               106028495               0                       0                       0                       0                    
Exception in thread "AptaPlex Main" org.mapdb.DBException$DataCorruption: Header checksum broken. Store was not closed correctly and might be corrupted. Use `DBMaker.checksumHeaderBypass()` to recover your data. Use clean shutdown or enable transactions to protect the store in the future.
        at org.mapdb.StoreDirectAbstract.fileHeaderCheck(StoreDirectAbstract.kt:113)
        at org.mapdb.StoreDirect.<init>(StoreDirect.kt:114)
        at org.mapdb.StoreDirect$Companion.make(StoreDirect.kt:57)
        at org.mapdb.StoreDirect$Companion.make$default(StoreDirect.kt:56)
        at org.mapdb.DBMaker$Maker.make(DBMaker.kt:450)
        at lib.aptamer.datastructures.Metadata.saveDataToFile(Metadata.java:186)
        at lib.parser.aptaplex.AptaPlexParser.parsingCompleted(AptaPlexParser.java:140)
        at lib.parser.aptaplex.AptaPlexParser.run(AptaPlexParser.java:155)
        at java.base/java.lang.Thread.run(Thread.java:833)
106030528               106028495               0                       0                       0                       0                       0                       0
Parsing Completed in 78925.35 seconds.

Selection Cycle Statistics
... 
** names of cycles removed for brevity **
...
Using existing sequencing data
Starting Data Export
Exporting pool data to file C:\Users\Administrator\Documents\AptamerSeqInfo\NeoVentTAG\export\pool.txt.gz
Progress 11935504/11935504
Exiting.

Because the list of cycles is relatively long, and it exported the pool data okay, I hadn't noticed that error before. Not sure, but thought that perhaps this error message &/or seeing when it occurs might be helpful?

I also recalled reading in some other issue/s that Experiment.projectPath could cause trouble in those situations, so I am parsing the data again without that specified (commented out in the config file) to see if that does makes a difference. It's the only idea I've got. I will also save everything that gets printed to the screen with this run, in case there is more that is printed on screen that can shed some light on what's happening. I'll let you know how it goes!

so-susan avatar Jun 01 '22 17:06 so-susan

I parsed the data again (this time using aptasuite-0.9.8 for simplicity), and I finally noticed that the error appears to happen at the end of parsing.

Here's the screen output from that:

106030528               106028495               0                       0                       0                       0                    
Exception in thread "AptaPlex Main" org.mapdb.DBException$DataCorruption: Header checksum broken. Store was not closed correctly and might be corrupted. Use `DBMaker.checksumHeaderBypass()` to recover your data. Use clean shutdown or enable transactions to protect the store in the future.
        at org.mapdb.StoreDirectAbstract.fileHeaderCheck(StoreDirectAbstract.kt:113)
        at org.mapdb.StoreDirect.<init>(StoreDirect.kt:114)
        at org.mapdb.StoreDirect$Companion.make(StoreDirect.kt:57)
        at org.mapdb.StoreDirect$Companion.make$default(StoreDirect.kt:56)
        at org.mapdb.DBMaker$Maker.make(DBMaker.kt:450)
        at lib.aptamer.datastructures.Metadata.saveDataToFile(Metadata.java:186)
        at lib.parser.aptaplex.AptaPlexParser.parsingCompleted(AptaPlexParser.java:140)
        at lib.parser.aptaplex.AptaPlexParser.run(AptaPlexParser.java:155)
        at java.base/java.lang.Thread.run(Thread.java:833)
106030528               106028495               0                       0                       0                       0                       0                       0
Parsing Completed in 78925.35 seconds.

Selection Cycle Statistics
... 
** names of cycles removed for brevity **
...
Using existing sequencing data
Starting Data Export
Exporting pool data to file C:\Users\Administrator\Documents\AptamerSeqInfo\NeoVentTAG\export\pool.txt.gz
Progress 11935504/11935504
Exiting.

Because the list of cycles is relatively long, and it exported the pool data okay, I hadn't noticed that error before. Not sure, but thought that perhaps this error message &/or seeing when it occurs might be helpful?

I also recalled reading in some other issue/s that Experiment.projectPath could cause trouble in those situations, so I am parsing the data again without that specified (commented out in the config file) to see if that does makes a difference. It's the only idea I've got. I will also save everything that gets printed to the screen with this run, in case there is more that is printed on screen that can shed some light on what's happening. I'll let you know how it goes!

So how do you deal with the issue, I meet the same thing now.

panxiaoguang avatar May 01 '24 08:05 panxiaoguang