VictoriaMetrics icon indicating copy to clipboard operation
VictoriaMetrics copied to clipboard

Importing data from InfluxDB with vmctl causes unnecessary data duplication on series that add tags

Open dpedu opened this issue 1 year ago • 0 comments

Describe the bug

Tl;dr: adding a new tag to a time series in InfluxDB (thus creating a new series) causes vmctl to duplicate all subsequently added data when later importing into Victoriametrics

I have, in InfluxDB, a time series called rank that tracks the rank of users in the context of some game. Executing show series on <db> in influxdb shows me the following series:

rank,username=blah
rank,alias=blah,username=blah

We'll call the first one series "A", and the second series "B".

At some point in the past, only series A existed as I didn't need the alias tag. But, a need for it arose so I added it, creating series B.

This is two series - HOWEVER - at a certain point in time, which we'll call the cutoff point - I stopped adding data to series A and cut over to adding data only to series B.

However, when importing with vmctl, the result was that series B data was copied into series A, as well as series B being present as well. Meaning, the data in series B was completely duplicated by vmctl.

To elaborate: in my original InfluxDB data, these two series do not overlap in terms of timespan, at all. Series A ends at an exact moment, and series B begins. But what vmctl has done, is copy all of series B into series A resulting in Victoriametric's copy of series A spanning the length of A+B.

I suspect this happens because vmctl does not understand how InfluxDB works. InfluxDB behaves more like a SQL database where SELECTing FROM rank WHERE username=xyz is going to show you applicable data, as opposed to the series grouping Victoriametrics/Prometheus use.

Vmctl should notice that time series returned from InfluxDB may have different tags than the series vmctl asked for, and should know to omit this data during import, instead of erroneously duplicating it.

To Reproduce

  • Create 2 series as above in influxdb
  • Attempt to migrate the data to Victorametrics
  • See that data is duplicated

Version

vmctl version vmctl-20241002-113624-tags-v1.104.0-0-g0d4f4b8f7d

Logs

No response

Screenshots

No response

Used command-line flags

Vmctl import command:

docker run -it --rm victoriametrics/vmctl:v1.104.0 influx --influx-database mydatabase --influx-addr http://influx01:8086 --vm-addr http://vmetrics:8480 --verbose --vm-account-id 0 --vm-concurrency 8

Additional information

No response

dpedu avatar Oct 18 '24 16:10 dpedu