positron icon indicating copy to clipboard operation
positron copied to clipboard

Data Explorer: Use concurrent.futures background job queue to return get_column_profiles results asynchronously

Open wesm opened this issue 1 year ago • 1 comments

As discussed in #4300, comm requests have to be served in the order they are received. Since the data explorer has some request types, like get_column_profiles that can grow very expensive for large datasets, this blocks "fast" requests like get_data_values from executing while these "slow" requests are pending.

This change adds a return_column_profiles frontend method that allows these requests to be fulfilled in the background, allowing fast requests to be served unimpeded. This resolves the immediate performance issue that the data explorer is facing, so this pattern can be refined over time as we have need to do other asynchronous / background request handling in the kernels.

Here's what the "blocked request" behavior looks like on the main branch:

https://github.com/user-attachments/assets/e614bb8b-c265-4daa-9e15-4cd93fd5e671

And here is with this change:

https://github.com/user-attachments/assets/dce1eb8b-07d4-470a-b12c-c1f16566b8f2

The main difference is that the grid values load immediately, and then the null percentages load later when the return_column_profiles event is fired.

This change can't be merged until the corresponding changes are implemented in Ark.

wesm avatar Aug 12 '24 18:08 wesm

I’m not sure if this issue affects Ark or not. My understanding from what @lionel- told me is that they also want to serve requests in order in Ark also, so the asynchronous API here may be the preferred way.

@seeM could you take charge of pushing this over the finish line (with whatever changes are necessary in Ark) since I’m OOO for the next couple weeks?

wesm avatar Aug 13 '24 14:08 wesm

Thanks for pushing this through!! This is a big UX improvement in the data explorer

wesm avatar Sep 10 '24 15:09 wesm