drill DRILL-2362: Profile Mgmt

NOTE: This PR is a revamp of the work done for DRILL-5270 (PR #1250 and #1654). Those PRs were intended to improve the profile loading time for the WebUI, but did not address the fundamental problem of having too many profiles in the profiles directory.

When Drill is displaying profiles stored on the file system (Local or Distributed), it does so by loading the entire list of .sys.drill files in the profile directory, sorting and deserializing. This can get expensive, since only a single CPU thread does this. As an example, a directory of 120K profiles, the time to just fetch the list of files alone is about 6 seconds. After that, based on the number of profiles being rendered, the time varies. An average of 30ms is needed to deserialize a standard profile, which translates to an additional 3sec for the rendering of default 100 profiles.

A user reported issue confirms just that: DRILL-5028 Opening profiles page from web ui gets very slow when a lot of history files have been stored in HDFS or Local FS

Additional JIRAs filed ask for managing these profiles DRILL-2362 Drill should manage Query Profiling archiving DRILL-2861 enhance drill profile file management

This PR brings the following enhancements to ensure that profiles are better managed.

All profiles are now written into directory partitions based on the timestamp of the queries. By default, they will be written to <profileDir>/yyyy/mm/dd
During startup, the Drillbit checks if there are any profiles already in the root directory, and will move those into their correct partitions. For sake of performance, this is restricted to the 10000 (configurable) most recent profiles.
If the profile directory is on a distributed filesystem and there are un-indexed/partitioned profiles (see # 2 above), the Drillbits use Zookeeper to synchronize on who will do the indexing/partitioning.
When profiles are listed in the WebUI, the WebServer will only explore and fetch profiles from the most recent partitions, rather than explore the entire partitioned space.
When an individual profile is accessed based on its query ID, Drill is try to extrapolate the time range of when the query was submitted. This time-range is used to infer the possible partitions. These partitions are then explored to find the profile. This is essentially reverse engineering the process of query ID generation. Reference: https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/work/user/UserWorker.java#L67
We also append list of parent directories. This is done because if the partition format has been made more granular, we don't repartition the profiles. So Drill will miss profiles in directories that were formerly leaf directories. e.g. moving from yyyy/mm/dd to yyyy/mm/dd/hh
The WebUI often accesses the profile twice instead of once, because, in addition to the WebServer making a call to the JSON file for rendering, the fragment Gantt chart's javascript also makes an independent call to the WebServer. To mitigate this, Drill will leverage Guava cache to store the 200 most recently accessed profiles. This means that when a user lists profiles and then chooses to visit one, we can take advantage of the cache to avoid making 3 calls to the filesystem.
A separate directory - diorama , also exists in the root of the profile directory. The purpose of this is to allow users to dump external profiles that can be then rendered and visualized in the WebUI. Currently, the profile needs to be dumped manually and cannot be done via the WebUI. This could be a future enhancement,

Apr 15 '19 17:04 kkhatua

@arina-ielchiieva I'm looking for a suggestion on how to manage existing profiles. Currently, during startup, we automatically index any profiles in the root of the profiles dir into their respective location. As a default, I have set it at 1000, and the Drillbits syncrhonize if the profile dir is on a DFS. However, I am not sure if we should automatically index any new profile that has been copy-pasted into the root dir. For e.g., we might get a profile from a JIRA and would like to view it. Should we leave it there (and try to render it) or should we index it ASAP ?

Apr 15 '19 17:04 kkhatua

@arina-ielchiieva could you please review this PR ?

May 17 '19 21:05 kkhatua

Hi @kkhatua! Thank you for this contribution, I'd like to help to move it forward.

However, I am not sure if we should automatically index any new profile that has been copy-pasted into the root dir. For e.g., we might get a profile from a JIRA and would like to view it. Should we leave it there (and try to render it) or should we index it ASAP ?

I share your concern here. I think we should consider having Drill only write new profiles to partitioned directories. Any partitioning of historical profiles can be done externally by admins, in my opinion, and we can add examples of "housekeeping" scripts for doing that to the Drill documentation.

Would you like to do any of the following?

Resume the work on this PR, I volunteer to be a reviewer.
Receive a PR from me to your fork here that's rebased and has some changes I think we want.
You're too busy now, so I'll pull your commits here into a new branch of my own and open a new PR.

Thanks James

Sep 23 '22 12:09 jnturton

Or, if we do want to keep this built in ability to partition existing profiles, perhaps we should have it launched from a button on the Profiles page in the web UI instead of on Drillbit startup? That would remove the complication of which Drillbit does the work and the worries of slowing down startup or partitioning profiles that nobody wanted partitioned.

Sep 23 '22 12:09 jnturton

drill drill copied to clipboard

DRILL-2362: Profile Mgmt

drill
drill copied to clipboard