zui icon indicating copy to clipboard operation
zui copied to clipboard

Preview & Load Feature 2 run zq.exe

Open SoftTools59654 opened this issue 2 years ago • 1 comments

When using the data import function in the Preview & Load Feature section, two zq.exe programs are executed at the same time. Both of which put a lot of pressure on the system, while this pressure continues even after the number of lines and the data type are determined

While no filters or changes have been applied, this pressure continues on the CPU

There is not much problem in low volume data, but it becomes a problem when the stagnation is high

While the option is not loaded and no data import has been done

image

SoftTools59654 avatar Dec 20 '23 18:12 SoftTools59654

@SoftTools59654: I think I understand your concern. I've attached a video of a large data import on Windows with Zui v1.5.0 that I'll walk through to make sure we're both reacting to the same thing.

https://github.com/brimdata/zui/assets/5934157/14233cec-3ee0-42cc-91d7-940c7cdc16e0

As it shows, when I drag the large file into Zui and the Preview & Load screen starts, four zq.exe processes show up in the Task Manager. You spoke of two in your problem description but your screenshot showed four as well.

Based on your text I think you picked up on this connection, but each zq.exe process does indeed correspond to one of the "Loading Types..." or "Loading Count..." background queries that run while the user is looking at the initial lines of preview data and starting to work on a possible shaper script, pool settings, etc.

As the video shows, those zq.exe processes disappear at the same time those counts of Types and Rows are populated based on those queries. This leads to my first question: Do you see the same? In your description of the problem you said "this pressure continues even after the number of lines and the data type are determined" which is not what I've observed. So if you can consistently see zq.exe processes alive in Task Manager consuming CPU even after those Type & Row results have been populated, I would need your help to provide more detailed steps to reproduce the problem, as it's not something I've been able to reproduce.

It's probably unrelated to your inquiry, but as long as we're covering the life cycle of these zq.exe processes, the video goes on to show that once the Load button is pressed and the final data import proceeds, a zq.exe process reappears in Task Manager. I just wanted to point out that this is expected as well. This process plays the role of applying the shaper script to the original data, and its output is piped to the Zed lake service (i.e., the zed process) to commit the data to the lake. Even if you've not applied any shaping, this zq.exe process is not wasteful since it will also perform much of the heavy lifting of converting formats (e.g., CSV, JSON, etc.) to compact/binary ZNG, which means the Zed process will have less work to do.

Please do let us know if what you've observed in your environment is different from what's shown/described above. Assuming I've captured it accurately, I have two possible ideas for improvement.

  1. If you find you find the CPU pressure from these processes is more bothersome than being able to see those summary query results with the Type & Row counts, we might look at putting in some kinds of controls so the user could cancel those queries. If so, perhaps we could have options to cancel them just one time as well as keep them disabled until the user has some interest in re-enabling them. Let us know if these ideas would help you.

  2. Zui has always been intended to be a "desktop scale" experience, so if you're performing heavy data imports/queries (and particularly if your desktop is not high-powered) it may be appropriate to expand beyond Zui. If you're finding the resource consumption from large data imports is a consistent problem, you might want to work on becoming more comfortable with working with the Zed CLI tools and/or Zed lake API outside of the Zui app. The zed command docs and tutorial describe how to perform the same kinds of operations that are being done in Zui's Preview & Load screen. The Zui doc about Remote Zed Lakes describes how you could do these shape/load operations separate from the Zui desktop on a higher-powered server that would hold the Zed lake, then use Zui to connect to that "remote" Zed lake to perform queries against the stored data.

Let us know if you have any questions or feedback on those ideas.

philrz avatar Dec 22 '23 19:12 philrz