quickfixj
quickfixj copied to clipboard
FileStore stream is closed: java.io.IOException: Stream Closed
Describe the bug If the FileStore stream is closed for any unexpected reason, the reset will fail indefinitely even if the original cause is fixed (or was just a transient error), since it will continue to try and flush/closed the failed stream, instead of truly resetting and opening a new one.
In our case the error was triggered by the disk being full (space was then freed up), but any reason for closing the stream could get the store stuck in this unrecoverable state. Perhaps the store reset mechanism should be made more robust so that it really tries to reset even in this unexpected state rather than require the application to be restarted manually.
To Reproduce
Expected behavior
system information:
- OS: Ubuntu server
- Java version JDK8
- QFJ Version 2.1.1
Additional context 13:35:17.885 QFJ Timer ERROR quickfix.SocketInitiator.run:356 - Error during timer processing quickfix.RuntimeError: java.io.IOException: Stream Closed at quickfix.SessionState.reset(SessionState.java:384) at quickfix.Session.resetState(Session.java:2624) at quickfix.Session.generateLogon(Session.java:2003) at quickfix.Session.next(Session.java:1918) at quickfix.mina.SessionConnector$SessionTimerTask.run(SessionConnector.java:350) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Stream Closed at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at java.io.FilterOutputStream.close(FilterOutputStream.java:158) at quickfix.FileStore.close(FileStore.java:218) at quickfix.FileStore.close(FileStore.java:209) at quickfix.FileStore.closeAndDeleteFiles(FileStore.java:223) at quickfix.FileStore.initialize(FileStore.java:101) at quickfix.FileStore.reset(FileStore.java:405) at quickfix.SessionState.reset(SessionState.java:382) ... 11 common frames omitted Suppressed: java.io.IOException: Stream Closed at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.FilterOutputStream.close(FilterOutputStream.java:158) at java.io.FilterOutputStream.close(FilterOutputStream.java:159) ... 17 common frames omitted
@szaitsev86 Did you consider using the JdbcStore instead?
IMHO, using a FileStore is not recommended for production because the session state could be easily corrupted.
I would advise to use a database for production purposes.
@chrjohn , what do you think w.r.t. corruption due to incompletely written files?
@JThoennes I don't know how the error will be different with a JdbcStore when the disk is full on the database. :)
Regarding production usage: I am not aware of any official recommendations against using the FileStore.
I agree with the issue creator that the mechanism should be more robust, though.
@chrjohn Agreed with disk full on the database server :-) But I would expect that it is possible to write incomplete records if the disk is full and in this way corrupt the store. On the other hand, a database transaction is atomic, i.e., either completely written or not at all. Does this expectation match the actual operation of the message stores in QF/J. Or do I miss anything?
Sorry for the delay.
Although storing a message to the messages table might be atomic, there is another transaction that persists the next seqnum to the sessions table which runs separate from the first transaction. There also is #357 which might be a problem (had no time to analyze yet).
I didn't have time to check what happens if there is a mismatch between the messages and sessions table w.r.t. the seqnums.