questdb icon indicating copy to clipboard operation
questdb copied to clipboard

java.lang.UnsupportedOperationException: Unsupported WAL txn type: -1

Open MichalKoziorowski-TomTom opened this issue 1 year ago • 2 comments

Describe the bug

I see following log when pushing db entry in influxdb line protocol:

2023-06-30T14:22:53.364191Z I i.q.c.p.WriterPool >> [table=`shelly_plugs_temperature1~24`, thread=26]
2023-06-30T14:22:53.364909Z I i.q.c.p.WriterPool << [table=`shelly_plugs_temperature1~24`, thread=26]
2023-06-30T14:22:53.364913Z I i.q.c.p.WriterPool >> [table=`shelly_plugs_meter1~23`, thread=25]
2023-06-30T14:22:53.364921Z C server-main unhandled error [job=io.questdb.cairo.wal.ApplyWal2TableJob@4196c360, ex=
java.lang.UnsupportedOperationException: Unsupported WAL txn type: -1
        at io.questdb.cairo.wal.ApplyWal2TableJob.processWalCommit(ApplyWal2TableJob.java:446)
        at io.questdb.cairo.wal.ApplyWal2TableJob.applyOutstandingWalTransactions(ApplyWal2TableJob.java:334)
        at io.questdb.cairo.wal.ApplyWal2TableJob.applyWAL(ApplyWal2TableJob.java:524)
        at io.questdb.cairo.wal.ApplyWal2TableJob.doRun(ApplyWal2TableJob.java:576)
        at io.questdb.mp.AbstractQueueConsumerJob.run(AbstractQueueConsumerJob.java:41)
        at io.questdb.mp.Worker.run(Worker.java:118)
]
2023-06-30T14:22:53.365738Z I i.q.c.p.WriterPool << [table=`shelly_plugs_meter1~23`, thread=25]
2023-06-30T14:22:53.365753Z C server-main unhandled error [job=io.questdb.cairo.wal.ApplyWal2TableJob@2141a12, ex=
java.lang.UnsupportedOperationException: Unsupported WAL txn type: -1
        at io.questdb.cairo.wal.ApplyWal2TableJob.processWalCommit(ApplyWal2TableJob.java:446)
        at io.questdb.cairo.wal.ApplyWal2TableJob.applyOutstandingWalTransactions(ApplyWal2TableJob.java:334)
        at io.questdb.cairo.wal.ApplyWal2TableJob.applyWAL(ApplyWal2TableJob.java:524)
        at io.questdb.cairo.wal.ApplyWal2TableJob.doRun(ApplyWal2TableJob.java:576)
        at io.questdb.mp.AbstractQueueConsumerJob.run(AbstractQueueConsumerJob.java:41)
        at io.questdb.mp.Worker.run(Worker.java:118)

Everything was working properly just before 24 of June 2023. It's possible that server experienced some kind of unexpected shutdown around that time.

To reproduce

No response

Expected Behavior

No response

Environment

- **QuestDB version**:7.2
- **OS**: Docker hub docker image questdb/questdb:7.2

Additional context

No response

Fix:

ALTER TABLE shelly_plugs_meter1 set type BYPASS WAL
ALTER TABLE shelly_plugs_temperature1 set type BYPASS WAL

and QuestDB restart, and

ALTER TABLE shelly_plugs_meter1 set type WAL
ALTER TABLE shelly_plugs_temperature1 set type WAL

and QuestDB restart

Cause of the issue: no idea.

Just adding in that I’ve seen this same error, also when an unexpected shutdown occurred and also with the same questdb version. However, the suggested fix did actually work for me. Is there any more information we can provide to help resolve this?

2023-07-07T16:35:30.565294Z C server-main unhandled error [job=io.questdb.cairo.wal.ApplyWal2TableJob@48d61b48, ex=q
java.lang.UnsupportedOperationException: Unsupported WAL txn type: -1
        at io.questdb.cairo.wal.ApplyWal2TableJob.processWalCommit(ApplyWal2TableJob.java:446)a
        at io.questdb.cairo.wal.ApplyWal2TableJob.applyOutstandingWalTransactions(ApplyWal2TableJob.java:334) 
        at io.questdb.cairo.wal.ApplyWal2TableJob.applyWAL(ApplyWal2TableJob.java:524)a
        at io.questdb.cairo.wal.ApplyWal2TableJob.doRun(ApplyWal2TableJob.java:576)p
        at io.questdb.mp.AbstractQueueConsumerJob.run(AbstractQueueConsumerJob.java:41) 
        at io.questdb.mp.Worker.run(Worker.java:118)R
]

nicholas-a-guerra avatar Jul 07 '23 01:07 nicholas-a-guerra

guys, if database on servers with unstanble power supply or spurious restarts are likely, please make sure SYNC mode is selected:

cairo.commit.mode=SYNC

bluestreak01 avatar Aug 15 '23 14:08 bluestreak01

I've had an incident where even with a cloud provider, occasional migrations or other issues can cause once in a year or longer random hard restarts. This has caused this same problem even still on version 7.3.3. I am already aware that Cairo commit mode of sync is the most optimal for unreliable power situations. However, there is still I believe a need for some in between solution for those not in that exact scenario. Is it possible that an alternate solution be looked into for this?

I have a couple ideas for possible solutions:

  1. When this failure does occur, it is currently required to manually run the bypass wal query, then restart, then set wal, then restart again. Maybe it could be possible to provide a simpler query that at least fixes the problem with one query and doesn't require a restart. That way we could attempt to automate on our own an easier recovery.

OR

  1. It would be even better if we could add a configuration option to auto recover if this ever occurs in real time. I would argue that for a very large user base of QuestDB it matters much more that the database is up and available at all times compared to suspending the table due to one corrupted entry. Maybe even just ignore any corrupted entries all together and just keep a count on this in some sys table. This type of architecture change would also allow SO many users the ability to regain ingestion rates by not having to rely on Cairo commit mode sync and just instead agree that there could be loss of any in transit data during power outage. I believe the large majority of users who are in this scenario would be willing to make this compromise.

nicholas-a-guerra avatar Oct 18 '23 17:10 nicholas-a-guerra