questdb
questdb copied to clipboard
java.lang.UnsupportedOperationException: Unsupported WAL txn type: -1
Describe the bug
I see following log when pushing db entry in influxdb line protocol:
2023-06-30T14:22:53.364191Z I i.q.c.p.WriterPool >> [table=`shelly_plugs_temperature1~24`, thread=26]
2023-06-30T14:22:53.364909Z I i.q.c.p.WriterPool << [table=`shelly_plugs_temperature1~24`, thread=26]
2023-06-30T14:22:53.364913Z I i.q.c.p.WriterPool >> [table=`shelly_plugs_meter1~23`, thread=25]
2023-06-30T14:22:53.364921Z C server-main unhandled error [job=io.questdb.cairo.wal.ApplyWal2TableJob@4196c360, ex=
java.lang.UnsupportedOperationException: Unsupported WAL txn type: -1
at io.questdb.cairo.wal.ApplyWal2TableJob.processWalCommit(ApplyWal2TableJob.java:446)
at io.questdb.cairo.wal.ApplyWal2TableJob.applyOutstandingWalTransactions(ApplyWal2TableJob.java:334)
at io.questdb.cairo.wal.ApplyWal2TableJob.applyWAL(ApplyWal2TableJob.java:524)
at io.questdb.cairo.wal.ApplyWal2TableJob.doRun(ApplyWal2TableJob.java:576)
at io.questdb.mp.AbstractQueueConsumerJob.run(AbstractQueueConsumerJob.java:41)
at io.questdb.mp.Worker.run(Worker.java:118)
]
2023-06-30T14:22:53.365738Z I i.q.c.p.WriterPool << [table=`shelly_plugs_meter1~23`, thread=25]
2023-06-30T14:22:53.365753Z C server-main unhandled error [job=io.questdb.cairo.wal.ApplyWal2TableJob@2141a12, ex=
java.lang.UnsupportedOperationException: Unsupported WAL txn type: -1
at io.questdb.cairo.wal.ApplyWal2TableJob.processWalCommit(ApplyWal2TableJob.java:446)
at io.questdb.cairo.wal.ApplyWal2TableJob.applyOutstandingWalTransactions(ApplyWal2TableJob.java:334)
at io.questdb.cairo.wal.ApplyWal2TableJob.applyWAL(ApplyWal2TableJob.java:524)
at io.questdb.cairo.wal.ApplyWal2TableJob.doRun(ApplyWal2TableJob.java:576)
at io.questdb.mp.AbstractQueueConsumerJob.run(AbstractQueueConsumerJob.java:41)
at io.questdb.mp.Worker.run(Worker.java:118)
Everything was working properly just before 24 of June 2023. It's possible that server experienced some kind of unexpected shutdown around that time.
To reproduce
No response
Expected Behavior
No response
Environment
- **QuestDB version**:7.2
- **OS**: Docker hub docker image questdb/questdb:7.2
Additional context
No response
Fix:
ALTER TABLE shelly_plugs_meter1 set type BYPASS WAL
ALTER TABLE shelly_plugs_temperature1 set type BYPASS WAL
and QuestDB restart, and
ALTER TABLE shelly_plugs_meter1 set type WAL
ALTER TABLE shelly_plugs_temperature1 set type WAL
and QuestDB restart
Cause of the issue: no idea.
Just adding in that I’ve seen this same error, also when an unexpected shutdown occurred and also with the same questdb version. However, the suggested fix did actually work for me. Is there any more information we can provide to help resolve this?
2023-07-07T16:35:30.565294Z C server-main unhandled error [job=io.questdb.cairo.wal.ApplyWal2TableJob@48d61b48, ex=q
java.lang.UnsupportedOperationException: Unsupported WAL txn type: -1
at io.questdb.cairo.wal.ApplyWal2TableJob.processWalCommit(ApplyWal2TableJob.java:446)a
at io.questdb.cairo.wal.ApplyWal2TableJob.applyOutstandingWalTransactions(ApplyWal2TableJob.java:334)
at io.questdb.cairo.wal.ApplyWal2TableJob.applyWAL(ApplyWal2TableJob.java:524)a
at io.questdb.cairo.wal.ApplyWal2TableJob.doRun(ApplyWal2TableJob.java:576)p
at io.questdb.mp.AbstractQueueConsumerJob.run(AbstractQueueConsumerJob.java:41)
at io.questdb.mp.Worker.run(Worker.java:118)R
]
guys, if database on servers with unstanble power supply or spurious restarts are likely, please make sure SYNC
mode is selected:
cairo.commit.mode=SYNC
I've had an incident where even with a cloud provider, occasional migrations or other issues can cause once in a year or longer random hard restarts. This has caused this same problem even still on version 7.3.3. I am already aware that Cairo commit mode of sync is the most optimal for unreliable power situations. However, there is still I believe a need for some in between solution for those not in that exact scenario. Is it possible that an alternate solution be looked into for this?
I have a couple ideas for possible solutions:
- When this failure does occur, it is currently required to manually run the bypass wal query, then restart, then set wal, then restart again. Maybe it could be possible to provide a simpler query that at least fixes the problem with one query and doesn't require a restart. That way we could attempt to automate on our own an easier recovery.
OR
- It would be even better if we could add a configuration option to auto recover if this ever occurs in real time. I would argue that for a very large user base of QuestDB it matters much more that the database is up and available at all times compared to suspending the table due to one corrupted entry. Maybe even just ignore any corrupted entries all together and just keep a count on this in some sys table. This type of architecture change would also allow SO many users the ability to regain ingestion rates by not having to rely on Cairo commit mode sync and just instead agree that there could be loss of any in transit data during power outage. I believe the large majority of users who are in this scenario would be willing to make this compromise.