influxdb [bug] drop/delete terribly slow

Bug report

Influx version: v1.5.1 OS version: Ubuntu Linux 17.04

Steps to reproduce:

Have a big database with > 20000 series
Try to execute drop series from /{{regexp_pattern}}/.
or try to execute drop measurement /{{regexp_pattern}}/.
or try to execute delete from /{{regexp_pattern}}/.

It will take > 30 minutes.

Expected behavior:

Well deleting data should take miliseconds?
Or there should be some special command modifier that will do it fast (I think this is becaus esome influx DB cleaning stuff). And there should be another comment to execute this "long" part manually, so for example I can delete ton of series and run cleanup process manually (to reclaim space) from a cron job or something like this.

Mar 27 '18 13:03 lukaszgryglicki

It definitely shouldn't be taking that long.

Can you gather some profiles while the slow delete is in progress?

curl -o profiles.tar.gz "http://localhost:8086/debug/pprof/all?cpu=true"

curl -o vars.txt "http://localhost:8086/debug/vars"
iostat -xd 1 30 > iostat.txt

The workaround I would try is to break out the regular expression, like running SHOW MEASUREMENTS WITH MEASUREMENT =~ /pattern/, and then make individual DROP calls in one statement, like DROP MEASUREMENT match1; DROP MEASUREMENT match2; DROP MEASUREMENT match3....

If you run the DROP calls as separate statements, you may trigger a compaction on each run.

Mar 27 '18 16:03 mark-rushakoff

I will try to provide the details You need tomorrow. But dropping series one by one is actually a lot faster, than single drop with regexp, but still dropping any single series takes tens of seconds.

Can I somehow turn compaction off? My actual / current solution (implemeneted toay) is code reorganization in such a way that drop is no longer needed, I'm using time value as a kay, so no need to drop. Adding a point will either add new point or overwrite existing one.

I also have a problem with influx db copy program (that I've. written myself months ago). It now takes about 3-10x times more than it was taking on v1.3.x.

I'll be investigating this tomorrow, for now the 3x-10x more time is due to mutiple "timeout error" that I'm receiving and retrying (my copy program deals with this).

I suspect the problem is due to the number of threads - I'm using 48 threads and http-timeout = 300s. And just after about 300s of data copying I'm staring to receive "timeout error" Maybe the influxd server is not allowing 48 threads at once? But I cannot check this now, because very time consuming process is runniung, and it will take hours to finish.

Is there a way to tell influxd to allow unlimited number of connections or say at least 100? What is the default?

This the influx db backup tool. It can copy one influx database to another local/remote etc: https://github.com/cncf/devstats/blob/master/cmd/idb_backup/idb_backup.go

I need such a tool becausewhen I'm doing a full data regenerate, I need to save it to the temp database first (it takes hours), not to disturb original database which is used by Grafana. Once this is completed successfully I'm copying temp database to the one used by Grafana (which is safer and gives less downtime) and finall drop the temp database.

The better option would be to generate temp db and the just rename it, but unfotunatelly Influx db is not supporting this too :-(

Mar 27 '18 16:03 lukaszgryglicki

Uploaded requested profiles: profiles.tar.gz iostat.txt vars.txt

While running delete from /^open_(issues|prs)_sigs_milestones/ - it already takes 4+ minutes.

Mar 29 '18 11:03 lukaszgryglicki

Delete still running (already 19 minutes) it was at least 40x faster in v1.3.x. Requested stats after about 17minutes running delete: iostat.txt profiles.tar.gz vars.txt

Mar 29 '18 12:03 lukaszgryglicki

Can I somehow turn compaction off?

I don't believe that is supported. Maybe setting max-concurrent-compactions=1 would suffice here. https://docs.influxdata.com/influxdb/v1.5/administration/config/#max-concurrent-compactions-0

Adding a point will either add new point or overwrite existing one.

Overwriting existing points is expensive and should generally be avoided if possible; otherwise it should happen in bulk if possible.

I also have a problem with influx db copy program (that I've. written myself months ago).

I suspect you'll get much better performance with using the native backup and restore tools.

Is there a way to tell influxd to allow unlimited number of connections or say at least 100? What is the default?

Default should be unlimited: https://docs.influxdata.com/influxdb/v1.5/administration/config/#max-connection-limit-0

Uploaded requested profiles:

Thanks, this will be very helpful in identifying the performance bottleneck.

@rbetts care to prioritize/assign this?

Mar 29 '18 15:03 mark-rushakoff

@lukaszgryglicki Thanks for the very complete report. We'll see triage this in an upcoming grooming session.

Mar 29 '18 15:03 rbetts

BTW: I've killed the query after 40 minutes and implemented alternate solution:

Program that zero-fills all series which match given regexp:

https://github.com/cncf/devstats/blob/master/cmd/z2influx/z2influx.go

Mar 29 '18 15:03 lukaszgryglicki

I suspect you'll get much better performance with using the native backup and restore tools.: BTW: is there a native program to copy one influx DB into another, or rename? There wasn't such a program 3 months ago, I'm quite sure, because I was researching this for more than week some time ago?

Overwriting existing points is expensive and should generally be avoided if possible; otherwise it should happen in bulk if possible.: All my programs use bulk writing.

Mar 29 '18 15:03 lukaszgryglicki

BTW: is there a native program to copy one influx DB into another, or rename?

https://docs.influxdata.com/influxdb/v1.5/administration/backup_and_restore/

Mar 29 '18 15:03 mark-rushakoff

I'll double check tomorrow then but a quick question: Can I backup database named A into some file and then restore into database named B? What I need to do

Assume I have to regenerate database A (calculations are slow, takes hours, data is generated from complex set of postgres SQLs)
I'm generating new influx datamase A_temp
If success then I want to rename A_temp to A (overwriting original A).
If there is an error with A_temp generation, A remains intact and A_temp (broken) is dropped.

Copy A_temp to A should be orders of magnitude faster than just generating A (to avoid downtime).

Is this feature new in v1.5 ? There already were my issues for database copying reported here few months ago and my solutions to that.

Mar 29 '18 15:03 lukaszgryglicki

Maybe -newdb option of influxd will be useful for creating a copy of a database, but I see that I need to stop influxd to restore database. This is not a good solution, because influxd server more than 20 databases, so this is not a good idea to stop it just to copy a database. Why do I need to stop influxd when restoring on a new (not yet existing) fresh database? My custom written idb_backup program does full copy & restore of influx database without stopping influxd and it was quite fast in v1.3.x and now is a lot slower in v1.5.0 but still usable. What is interesting: it seems like its performance drops when series number increase. And it looks like a quadric dependency... So when I have > 10000 series it becomes slow, but for 50000 series it is awful!

Mar 29 '18 18:03 lukaszgryglicki

Seems like regexps are also very slow. There is only one series name which match /^prs_labelled_d$/ and it is named "prs_labelled_d". When I do:

select last(*) from "prs_labelled_d" it returns data instantly. When I do:
select last(*) from /^prs_labelled_d$/ it takes 2m30s.

Mar 30 '18 11:03 lukaszgryglicki

I am having a similar issue where I need to delete lots of measurements (malformed graphite template led to 100k measurements being created). Deleting an individual measurement takes between 20s and 2m when it finishes at all (sometimes it just hangs).

I am unable to drop by regex (I get an error: ERR: error parsing query: found /, expected identifier at line 1, char 18).

The system is reasonably loaded, using 50% of system ram and ~70% load.

This seems entirely related to the number of measurements, not the amount of data. These operations yesterday were fast (less than 5 seconds, but not instant) when I had 50 measurements. I am not writing any data to the new bugged measurements, so my data has not grown much, but the number of measurements has. I went from ~50 measurements to ~110000. Since they are raw graphite items, their patterns are easy to regex, but if I was to delete the measurements individually it would take weeks.

Requesting the debug profiles using the command listed above takes 30s to start (the curl stats show no activity for 30s, then the download happens). This feels like the kind of latency that is happening during other operations (it just feels very slow and laggy). Due to that (and the lack of regex drop), I can't dump all three stats during the same measurement delete operation, so i just start the next one as soon as I can. As a note, the vars download is not slow.

Here is an archive of my profile data:

influx-profile.tar.gz

System Info (aws ec2 m5.large):

pol@influxdb-a-1 ~ % influx -version
InfluxDB shell version: 1.5.2
x86_64 x86_64 GNU/Linux
pol@influxdb-a-1 ~ % cat /etc/issue
Ubuntu 16.04.3 LTS \n \l
pol@influxdb-a-1 ~ % uname -a
Linux influxdb-a-1 4.4.0-1060-aws #69-Ubuntu SMP Sun May 20 13:42:07 UTC 2018 x86_64

May 24 '18 00:05 pol

We've replaced influx with PostgreSQL and delete is instant. Postgres (which is a surprise for me) is faster not only in delete operation, but in all operations basically.

May 24 '18 05:05 lukaszgryglicki

This is kind of drifting off-topic, but how do you deal with the dynamic schema benefits of influxdb? Do you just not index the tags and store all measurements as strings? We have a fairly complex pile of pre-existing metrics and developing a sql schema for all of them in a reasonable way would be very difficult (unless we just crudely stored them as txt or something); the benefit of influxdb (and other storage systems like it) is that it is specific to the problem of metrics storage and querying (as opposed to general purpose databasing).

May 24 '18 15:05 pol

I've implemented something that dynamically creates tables/columns/indices as needed. Took more than week, but it works great and is faster.

May 24 '18 15:05 lukaszgryglicki

I can reproduce this bug, but not consistently. Created a database with 100000 measurements and about 8 million series. I have twice seen delete from /regex/ behave as though it were completely locked up.

Delete from 1 measurement:

$ time influx -database stress -execute "delete from m99999"

real    0m0.084s
user    0m0.020s
sys     0m0.022s

Delete from about 9 measurements:

$ time influx -database stress -execute "delete from /m9999./"

real    0m3.773s
user    0m0.024s
sys     0m0.022s

Delete from about 90 measurements:

$ time influx -database stress -execute "delete from /m999.*/"

real    0m3.106s
user    0m0.019s
sys     0m0.018s

Delete from about 900 measurements (gave up, ctrl-C after ~5m):

$ time influx -database stress -execute "delete from /m99.*/"
^C

Adding some debug logging to tsdb/store.go I can see that there is a deadlock. This also causes shutdown to timeout, and failover to a hard shutdown.

Jul 03 '18 23:07 jacobmarble

The deadlock is on partition.go, Partition.Wait() which simply calls i.wg.Wait().

Jul 04 '18 00:07 jacobmarble

screen shot 2018-07-03 at 5 36 48 pm

Jul 04 '18 00:07 jacobmarble

Opening as there is still possibly a performance issue.

Jul 09 '18 16:07 e-dard

Related to #10056

Jul 09 '18 16:07 jacobmarble

master
$ time influx -database stress -execute 'delete from /m9.*/'

real    0m31.381s
user    0m0.021s
sys     0m0.021s

1.6
$ time influx -database stress -execute 'delete from /m9.*/'

real    0m29.174s
user    0m0.023s
sys     0m0.033s

1.5
$ time influx -database stress -execute 'delete from /m9.*/'

real    0m22.821s
user    0m0.022s
sys     0m0.029s

1.4 (inmem)
$ time influx -database stress -execute 'delete from /m9.*/'

real    0m0.385s
user    0m0.023s
sys     0m0.030s

1.3 (inmem)
$ time influx -database stress -execute 'delete from /m8.*/'

real    0m0.413s
user    0m0.020s
sys     0m0.023s

Jul 17 '18 20:07 jacobmarble

@jacobmarble do you have an update about this investigation/bug fix ?

Sep 14 '18 11:09 JulienChampseix

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jul 24 '19 11:07 stale[bot]

don't close.

Jul 24 '19 12:07 e-dard

I am using 1.7.7 and the issue still persists.

Deleting a single measurement with 1mio series takes forever.

iostat shows disk is not of a very high utilization


Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util

loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
nvme0n1          0.00  723.00      0.00   3046.50     0.00   163.00   0.00  18.40    0.00    1.62   0.07     0.00     4.21   0.05   3.60
nvme1n1          0.00  723.00      0.00   3046.50     0.00   163.00   0.00  18.40    0.00    1.84   0.10     0.00     4.21   0.07   5.20
md1              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
md2              0.00  761.00      0.00   3044.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     4.00   0.00   0.00
md0              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

Sep 28 '19 08:09 PennyYip

Still present in 1.7.8.
10_000 measurements with delete query delete where time < '2018-01-01' took 25 seconds.

Oct 01 '19 14:10 pohmelie

It's not just the delete that is slow, but the garbage collection and/or compaction that happens after the delete is terrible. I deleted a couple hundred measurements and the load jumped 6x for hour+. Memory maxed. IO wait is 50% or so of all CPU time (cloud resources!).

Oct 24 '19 23:10 bryanspears

If I remember correctly, we decided in 1.5 to block on delete, rather than accept the delete request and handle it asynchronously. My earlier analysis probably reflects this. So "that's a feature, not a bug" is my offhand comment for that.

@bryanspears have you tried limiting concurrent compactions? Where your bulk delete operation likely touched several shards, and those shards all require multiple levels of compaction, limiting concurrency has helped other folks in the past. Start with limiting to 1. https://docs.influxdata.com/influxdb/v1.7/administration/config/#max-concurrent-compactions-0

Oct 28 '19 22:10 jacobmarble

Possibly related to #15271

Oct 29 '19 15:10 jacobmarble