redash icon indicating copy to clipboard operation
redash copied to clipboard

Redash crashes on query that returns a lot of results

Open spapas opened this issue 1 year ago • 6 comments

Issue Summary

When I tried runnig a query returning a lot of data (more than 3M rows) I get the following output on the worker:

func_name=redash.tasks.queries.execution.execute_query job.id=80709639-67b3-4738-9f01-ca63f5209a52 job=execute_query state=executing_query query_hash=d2cac4fb0af2097af53c08a4bf3947c9 type=pg ds_id=9 job_id=80709639-67b3-4738-9f01-ca63f5209a52 queue=queries query_id=124 username=xxx
[2023-11-17 10:24:16,129][PID:22595][WARNING][rq.worker] Moving job to FailedJobRegistry (work-horse terminated unexpectedly; waitpid returned 9)

After some research it seems to be related to a problem of the worker not having enough memory or something similar.

Is it even possible to run such queries in redash?

Steps to Reproduce

  1. Run a query returning a lot of data from the web interface

Technical details:

  • Redash Version: I'm on this commit: commit a19b17b844063f215266286ea8bd185086e3e27a (HEAD -> master, origin/master, origin/HEAD)
  • Browser/OS: Chrome, Centos 7
  • How did you install Redash: Bare metal install from github

spapas avatar Nov 17 '23 08:11 spapas

Why you don’t aggregation data?

trantrinhquocviet avatar Nov 18 '23 04:11 trantrinhquocviet

@trantrinhquocviet because I need that data exported in order to provide it to somebody that can't access my database.

spapas avatar Nov 18 '23 07:11 spapas

I tried on my local (RPi with 8gb ram, 2.7m row query, 4 columns: smallint, real, text, integer) and confirmed this crashed my RPi.

This sounds ~reasonable to me, Redash loads all the data into memory so if there's too much data we'll OOM. Likely something we can only fix with optimizations though, and not a simple one-liner fix.

guidopetri avatar Nov 18 '23 18:11 guidopetri

Can you creat a local report to import your data? After connect with redash. Because of it is not a database. It is just data visualization tool.

trantrinhquocviet avatar Nov 19 '23 02:11 trantrinhquocviet

I think a dataviz tool should be able to handle large datasets too, though. Maybe the limit is less than 3m rows, but e.g. a scatter plot should be able to handle a very large amount of datapoints and could still be useful.

guidopetri avatar Nov 19 '23 02:11 guidopetri

Large query results are yet not supported.

https://github.com/getredash/redash/issues/78

noxdafox avatar Nov 23 '23 09:11 noxdafox