influxdb icon indicating copy to clipboard operation
influxdb copied to clipboard

shrink time window incorrectly with overlaped data block

Open foobar opened this issue 3 years ago • 3 comments

We are running influxdb 1.8 and have a single shard with 2TB TSM files. Recently some simple queries with 'ORDER BY TIME DESC' failed to return and hogged both cpu and memory resources.

I took time to investigate and found a possible bug when reading from overlapped data blocks, though following code has been there for around 6 years:

https://github.com/influxdata/influxdb/blob/d3be25b251256755d622792ec91826c5670c6106/tsdb/engine/tsm1/file_store.gen.go.tmpl#L186-L188

it's supposed to shrink the window but actually is expanding the window. If a block has a long time range and it overlaps with many blocks with small time range, the bug causes lots of blocks being merged. The merge operation is expensive and might cause the function to run for a couple of hours.

Please confirm if my understanding is correct and I can send a PR to fix it.

foobar avatar May 18 '22 12:05 foobar

Hi @foobar - seems possible with a brief initial look. We'd welcome your PR as we continue to examine it ourselves. @lesam mentioned to me that the the logic may be incorrect (reserved) for narrowing the window for the maxT case too.

philjb avatar May 24 '22 21:05 philjb

@foobar looking at this code, I think a better strategy to calculate the minT, maxT range to merge would be to just find the minimum minTime and the minimum maxTime (for ascending cursors), something like:

		for i := 0; i < len(c.current); i++ {
			cur := c.current[i]
			if cur.entry.MinTime < minT && !cur.read() {
				minT = cur.entry.MinTime
			}
			if cur.entry.MaxTime < maxT && !cur.read() {
				maxT = cur.entry.MaxTime
			}
		}

This is pretty deep stuff though, it would need a lot of careful testing before merge.

lesam avatar May 25 '22 17:05 lesam

		for i := 0; i < len(c.current); i++ {
			cur := c.current[i]
			if cur.entry.MinTime < minT && !cur.read() {
				minT = cur.entry.MinTime
			}
			if cur.entry.MaxTime < maxT && !cur.read() {
				maxT = cur.entry.MaxTime
			}
		}

this looks good but seems it would make the same result as calling OverlapsTimeRange. Anyway we need more testing and investigations.

foobar avatar Jun 05 '22 15:06 foobar