influxdb
influxdb copied to clipboard
InfluxDB loses all data when moving from `latest` tag to `2.1`
Hi, I deployed an InfluxDB instance with the latest
tag on 2nd March 2022. Some days later (11th March) I reconsidered using latest
and went with the tag 2.1
. In my understanding of SemVer, this should have included 2.1.1
. Then I ran docker-compose pull
and docker-compose up -d
. At this point, I was expecting nothing to be really changed, as latest
should (as I thought) be equal with 2.1.1
(last release). To my surprise, I was greeted with the onboarding screen on the Web Interface, which asked me to set InfluxDB up. At that point, it seems like the user storage of InfluxDB was overwritten, and I was not able to get my data back, as it was partially overwritten by the onboarding.
What I would have expected
I would have expected InfluxDB to be able to use the data, generated by latest
, inside the last tagged release 2.1(.1)
.
As a fallback, I would have expected InfluxDB to warn me that there is existing data, that it can't read, and quit - not to corrupt the whole database.
I also expect a "downgrade" from latest
to the last official release to work without errors, if not, changing from latest
, to any other version, would not be possible.
Possible solution
InfluxDB could check if there is existing data, and if there is, check if it can read it, before making any writes. Then it could validate the integrity of the data and only continue if the checks pass. If the data cannot be read, it could fail with an error message, explaining that it detected data that it cannot use. This could be a failsafe for any version change (up-/downgrade), without the possibility of losing data. The onboarding process should only start, if it's not overwriting existing data.
Main Problem
Changing the tag from latest
to the last release number, is something a user could understand as "pinning the current image version", to disable automatic upgrades on docker pull
. In my case, I lost ~30 Million entries in the database and all configured dashboards / alerts / users / tokens. This is not toooo bad, as they were mostly system metrics over a week, but it's reasonable to say, that other users could get into way worse problems with such a little change to the docker-compose file.
I hope this helps! Let me know if you have questions.
Since I loaded a full-server-backup of that container, I was not able to retrieve any logs.
Edit: I just saw the Twitter thread which answered this question for me.
~@MarvinJWendt Are you persisting the necessary volumes on the host machine?~
I think we'll need steps to reproduce this. This isn't a problem I've seen so there may be something you're doing differently than others.