ekuiper icon indicating copy to clipboard operation
ekuiper copied to clipboard

SQLite KV Database Incompatibility Between eKuiper 1.x and 2.x

Open LeeShan87 opened this issue 2 months ago • 2 comments

eKuiper 2.x cannot read streams and rules created by eKuiper 1.x due to a breaking change in the SQLite storage format. This causes error unmarshall errors and makes the database effectively corrupted after upgrading.

Environment

  • eKuiper 1.x version: 1.1.1
  • eKuiper 2.x version: 2.1.2
  • Database: sqliteKV.db

Reproduction Steps

  1. Start eKuiper 1.1.1
  2. Create streams and rules via REST API
  3. Stop eKuiper and upgrade to version 2.1.2
  4. Start eKuiper 2.1.2
  5. Attempt to describe or use existing streams

Observed Behavior

time="2025-10-15T13:37:18Z" level=error msg="describe stream error: Describe stream fails, error unmarshall <stream_name>, the data in db may be corrupted."

All existing streams become unusable and cannot be deleted via REST API.

Root Cause Analysis

The SQLite storage format changed between versions:

Database Schema Changes

eKuiper 1.1.1 - 3 tables:

sqlite3 /kuiper/data/sqliteKV.db ".tables"
rule    sink    stream

eKuiper 2.1.2 - 20+ tables:

sqlite3 /kuiper/data/sqliteKV.db ".tables"
asyncManager              portablePluginStatus      services
confKVStorage             rule                      sinkConfigStatus
connectionConfigStatus    ruleStatus                sourceConfigStatus
eKuiperMeta_bump_version  schema                    stream
nativePlugin              schemaStatus              streamStatus
nativePluginStatus        serviceFuncs              tableStatus
pluginFuncs               serviceInstall            uploads
portablePlugin            serviceInstallStatus      uploadsStatusDb

Storage Format Changes

Streams - Changed from plain SQL to JSON wrapper:

# eKuiper 1.1.1
sqlite3 /kuiper/data/sqliteKV.db "SELECT hex(val) FROM stream LIMIT 1;" | xxd -r -p
create stream my_stream () WITH ( DATASOURCE = "topic", FORMAT = "JSON", TYPE="mqtt")

# eKuiper 2.1.2
sqlite3 /kuiper/data/sqliteKV.db "SELECT hex(val) FROM stream LIMIT 1;" | xxd -r -p
{"streamType":0,"streamKind":"","statement":"create stream my_stream () WITH ( DATASOURCE = \"topic\", FORMAT = \"JSON\", TYPE=\"mqtt\")"}

Rules - JSON structure changed (removed triggered field, reordered fields):

# eKuiper 1.1.1
sqlite3 /kuiper/data/sqliteKV.db "SELECT hex(val) FROM rule LIMIT 1;" | xxd -r -p
{"triggered":true,"id":"rule1","sql":"SELECT * FROM stream1","actions":[...],"options":{...}}

# eKuiper 2.1.2
sqlite3 /kuiper/data/sqliteKV.db "SELECT hex(val) FROM rule LIMIT 1;" | xxd -r -p
{"id":"rule1","sql":"SELECT * FROM stream1","actions":[...]}

REST API Limitation

Once the database is corrupted, the REST API cannot be used to fix it:

GET Request (fails)

GET /streams/my_stream

HTTP/1.1 400 Bad Request
{
  "error": 3000,
  "message": "describe stream error: Describe stream fails, error unmarshall my_stream, the data in db may be corrupted."
}

DELETE Request (also fails)

DELETE /streams/my_stream

HTTP/1.1 400 Bad Request
{
  "error": 3000,
  "message": "delete stream error: error unmarshall my_stream, the data in db may be corrupted"
}

Workarounds

Option 1: Clean Installation (Recommended for upgrades)

# Stop eKuiper
docker stop ekuiper

# Remove old database
rm -rf /kuiper/data/sqliteKV.db

# Start eKuiper 2.x (creates fresh database)
docker start ekuiper

# Re-create all streams and rules

Option 2: Use Separate Database File

Modify etc/kuiper.yaml before upgrading:

store:
  sqlite:
    name: sqliteKV-v2.db  # Use different filename for v2.x

This preserves the old database and allows rollback if needed.

Option 3: Direct Database Manipulation (if already corrupted)

# Remove specific corrupted stream
docker exec ekuiper sqlite3 /kuiper/data/sqliteKV.db \
  "DELETE FROM stream WHERE key = 'my_stream';"

# Restart eKuiper
docker restart ekuiper

Notes

  • This issue occurs specifically when upgrading an existing deployment with a populated database
  • Fresh installations of eKuiper 2.x are not affected
  • The database format change appears to have been introduced between version 1.x and 2.x series
  • In our testing environment, this was not caught initially because integration tests always start with a fresh database

LeeShan87 avatar Oct 16 '25 16:10 LeeShan87

Thank you for the detailed summary. We confirm that V2 introduces breaking changes; this is the rationale behind the major version increment. We won't be implementing full backward compatibility. Instead, we are prioritizing the creation of a migration guide to assist users in upgrading, which will include these notes. To ensure accuracy, could you provide any specific insights from your recent migration experience?

ngjaying avatar Oct 31 '25 01:10 ngjaying

Hi @ngjaying,

Thanks for your response.

How was my migration experience? Painful :D But I hope I'm nearing the end of the tunnel.

I know I can't expect backward compatibility. I'm not expecting it at all. I'm just reporting what I find.

That's why I created this issue, as I couldn't find anything about it, or anything similar. What on earth is this "the data in db may be corrupted" error, if we've been passing integration tests all along.

(Disclaimer: This is my experience. I've already create separate issues for every pain point. :D)

Here are some details from the migration that I hope I can share. My current company needed a custom NATS stream and sink solution, and we also wrote quite a few custom function plugins for it. However, version 1.2.x already contained breaking changes. That's why we stayed on 1.1.1 for years. But now, due to security compliance, we are forced to upgrade.

So, for us, the main focus is the proper functioning of our custom plugins. The plugin conversion itself seemed problematic at first, but that part was basically just about using the new API. If I recall correctly, for functions, it was almost just a regreplace. For streams and sinks, however, I had to really think about what changed and to what. I had to dig through the git history a lot. Before I forget to mention, we are still using the Go plugin system. The portable plugin solution was tempting. But after I managed to process the first NATS message, I didn't want to mess with that part any further.

The fun only started after that.

  • Why aren't the streams and sinks compiling? Oh, right, they overwrite individual package versions in go.mod, and you have to explicitly pin the specific versions.
  • I wrote a bunch of integration tests, of course, I only ran the eKuiper-related ones locally. But when the full test suite ran... fatal error. Oh, right, another service is handling eKuiper's responses incorrectly. Wait a minute. The REST API response code changed from 404 to 400. What else has changed?
  • After I fixed the response code changes with a reverse proxy, I was surprised that the integration tests could only run once. During the tests, a backup of the KV database is made, and during the restore, the SQLite connection is lost. This makes the database read-only.
  • Then came the requirement. Let's run a Trivy scan on all images. Hey, a CVE. No problem, let's upgrade. Oh, so the bug isn't on my end. (BTW, there are still outdated dependencies in the latest release. So I still have to figure something out for this. And not put pressure on the maintainers.)

Then came the Panel tests (device tests). Error after error. But it seems this current issue was the cause (and poor logging on our part :D).

But I think the inability to delete corrupted streams is a bug. Especially since deletion doesn't require reading the stream value, it's enough to delete by ID. However, I've only observed this behavior in version 2.1.2.

Thanks for your work on open-source. I hope I was able to help a little with it.

LeeShan87 avatar Oct 31 '25 10:10 LeeShan87