xeus-sql icon indicating copy to clipboard operation
xeus-sql copied to clipboard

paging for large result sets?

Open jameshowison opened this issue 3 years ago • 9 comments

I'm running into responsiveness issues with queries with many results (~100,000 order of magnitude).

If it was just me then I'd add LIMIT statements, remembering to remove them later in the query buildup. But I'm working with students and asking doesn't make it so.

Any chance that the result sets could be paged, with the interface only handling those currently being looked at? This is a common approach in things like phpmyadmin or other sql clients, but I don't know whether SOCI makes that easy or not?

jameshowison avatar Feb 18 '22 16:02 jameshowison

hey @jameshowison, thanks for opening the issue. I was reading the discussion you had on the issue linked. Even if this is not a xeus-sql issue per-se is still an interesting question.

Any chance that the result sets could be paged, with the interface only handling those currently being looked at? This is a common approach in things like phpmyadmin or other sql clients, but I don't know whether SOCI makes that easy or not?

Do you know how this is implemented under the hood? If it's being paged it means they're running the statement again with a LIMIT set by them?

I don't know how we could implement something sensitive to being watched by the user, but we could think on some kind of pagination.

marimeireles avatar Feb 23 '22 00:02 marimeireles

Yeah, I think phpmyadmin and others just use LIMIT and OFFSET. Although I'm far from sure (it's what phpmyadmin shows as you browse). I found this note about using resultset/cursor and postgres specific FETCH. Perhaps something like that is implemented with SOCI?

https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/

Perhaps some of the impact can be resolved using something like https://pypi.org/project/jupyterlab-limit-output/

jameshowison avatar Feb 23 '22 00:02 jameshowison

Perhaps some of the impact can be resolved using something like https://pypi.org/project/jupyterlab-limit-output/

Yeah, that seems like something that should work out of the box, tbh.

marimeireles avatar Feb 23 '22 02:02 marimeireles

Thinking about this more, perhaps an across platform approach that protects against the most common error cases without the hassle of pagination would be to add LIMIT 100 to each query (unless another LIMIT was there) and include a link at the bottom to "Show All results" which removes the LIMIT 100?

I think needed to interactively look at more than 100 results would be exceedingly rare (I mean 100 is too many to look through anyway :)

jameshowison avatar Feb 23 '22 14:02 jameshowison

Just a note that I ran into this today with Jupyter becoming unresponsive with a 38MB ipynb file (resulting from large results from SELECT * FROM table queries.

jameshowison avatar Feb 25 '22 00:02 jameshowison

Just a note here (in case others find this in future) that https://github.com/kynan/nbstripout is able to resurrect files that have become unresponsive due to containing very large amounts of output.

jameshowison avatar Feb 27 '22 00:02 jameshowison

What about adding a parameter to the LOAD, something like limit_output = 100 would be a good explicit way to handle this. While adding a LIMIT 100 to the query enroute to the server would be one approach, another could be simply to truncate the results to never fill up output to a level that can make the page unresponsive. So something truncating rows around xeus_sql_interpreter.cpp:93?

jameshowison avatar Mar 03 '22 16:03 jameshowison

Thinking about this more, perhaps an across platform approach that protects against the most common error cases without the hassle of pagination would be to add LIMIT 100 to each query (unless another LIMIT was there) and include a link at the bottom to "Show All results" which removes the LIMIT 100?

I think needed to interactively look at more than 100 results would be exceedingly rare (I mean 100 is too many to look through anyway :)

This would also work.

I'm not sure I understand your last comment. How would that differ from using LIMIT?

marimeireles avatar Mar 07 '22 19:03 marimeireles

Last comment meant that I almost never manually look at results above 100 rows, I load larger results into a data analysis or visualization program. But I think having this configurable using limit_output = in the load cell is a great option.

jameshowison avatar Mar 07 '22 19:03 jameshowison