flux icon indicating copy to clipboard operation
flux copied to clipboard

Union transformation is slow

Open affo opened this issue 5 years ago • 5 comments

It seems that union has performance problems: https://github.com/influxdata/flux/pull/1142#issuecomment-538309063.

DOD:

  • verify that union has those problems via benchmarking
  • design a possible fix (could union be expressed in pure flux as a particular join?)
  • if the fix is trivial, implement it. Otherwise open up a new issue with the proposed fix.

affo avatar Oct 11 '19 14:10 affo

I have a script that consists of 3 parts:

  1. result = ...
  2. total = ...
  3. union(tables: [result, total])

I can compute and instantly yield parts 1 or 2 separately, but when I try to union them - script takes about 5 minutes to complete. I've ran it on InfluxDB 1.7.8 production instance which idles at 5-50% CPU utilization and rises up to 100-400% during those 5 minutes of querying.

Furthermore, I've tried saving result into CSV and repeating the test. In that case union has no detrimental effect on query speed. I've tested script both on the same production instanse of InfluxDB as well as locally with all releases of Flux standalone binaries from 0.54.0 (latest) to 0.36.2 (version bundled with InfluxDB 1.7.8). Results are consistently fast across all version range.

Problem arises only when I query result part for real. As if state from previous query stages is not discarded and somehow interferes with the process.

The script is below:

import "csv"

resultCSV = "#datatype,string,long,double,double,string,double
#group,false,false,false,false,false,false
#default,_result,,,,,
,result,table,expense,profitability,utm_campaign,profit
,,0,0,0,aaa,0
,,0,0,0,bbb,0
,,0,0,0,ccc,488.938656
,,0,0,0,ddd,0
,,0,2638.500865787432,-0.9851042140976252,eee,39.302544
,,0,582.8940206185567,-0.9779033588535838,fff,12.88
,,0,80.53292307692307,-0.5311234392431953,ggg,37.760000000000005
,,0,1363.8561307901905,0,hhh,0
,,0,239.46149999999997,0,iii,0
,,0,468.3076455696202,0,jjj,0
,,0,0,0,kkk,65.961696
,,0,337.52687081339707,-0.9848875795052774,lll,5.100848
,,0,0,0,mmm,432.999504
"
result = csv.from(csv: resultCSV)

total = result
|> reduce(
  identity: {profit: 0.0, expense: 0.0},
  fn: (r, accumulator) => ({
    profit: r.profit + accumulator.profit,
    expense: r.expense + accumulator.expense,
  }))
|> set(key: "utm_campaign", value: "TOTAL")
|> map(fn: (r) => ({ r with
	profitability: (r.profit - r.expense) / r.expense,
}))

// ### This one causes dead hang in real query ###
union(tables: [result, total])
// ### This one causes dead hang in real query ###
|> yield()

zarbis avatar Oct 22 '19 15:10 zarbis

@affo I've slightly updated my previous comment. Can I do some further testing to find root cause?

zarbis avatar Nov 25 '19 19:11 zarbis

@zarbis Sorry for the long delay.

@zarbis I think union performance at the moment is not something we will work on, but I absolutely don't wnat you to hang on this. So I suggest a workaround.

Do you think you would be able to use join (https://github.com/influxdata/flux/blob/master/docs/SPEC.md#join) in your specific use case?

If so, do you encounter the same performance issues?

affo avatar Nov 27 '19 17:11 affo

Figured I'd leave a comment here as I think this should have a higher priority. I have two queries. I want to use these as a pre-filter to then later join them with all_lendables again so I get all of the data but only for the filtered symbols.

all_lendables = from(bucket: "finance")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "lendable")
  |> filter(fn: (r) => r["_field"] == "fee")

elevated_rate = all_lendables
  |> last()
  |> filter(fn: (r) => r._value >= 100.)
  |> keep(columns: ["symbol"])
  |> group()

change = all_lendables
  |> aggregateWindow(every: 24h, fn: mean)
  |> difference()
  |> filter(fn: (r) => r._value >= 30.)
  |> last()
  |> keep(columns: ["symbol"])
  |> group()
  
//join(tables: {elevated_rate: elevated_rate, change: change}, on: ["symbol"], method: "inner")
//union(tables: [elevated_rate, change])

elevated_rate on its own takes 4.56s and produces 195 rows. change on its own takes 4.62s and produces 11 rows.

The union of both takes 67.32s and produces the expected 206 rows. The join of both takes 66.47s and produces the expected 206 rows.

I measured these by simply putting elevated_rate or change at the end of the script. I truly hope Flux ignores the non-used queries here.

Nevertheless, join is not a valid "workaround". And neither is using union itself if it takes 6 times longer than the original queries combined.

Here's the CSV output of all_lendables so you can try it yourself. 2022-08-05_17 34_influxdb_data.tar.gz (Warning, the contained file is 100MB in size. Had to compress it.)

Byter09 avatar Aug 05 '22 15:08 Byter09

This issue has had no recent activity and will be closed soon.

github-actions[bot] avatar May 05 '24 01:05 github-actions[bot]