DearPyGui
DearPyGui copied to clipboard
Optimize table updates
Is your feature request related to a problem? Please describe. Updating a table with more than 64 columns and 1000 rows of data is time consuming. Currently, it takes about 20-25 seconds. I improved it a bit by converting pandas dataframe to numpy array. If it is possible to optimize the 2D array data in a parallel way, there is a hope that the loading speed can be reduced to one tenth of the level.
Describe the solution you'd like It would be nice to be able to separate the data update from the way each cell is generated. I've used commercial components in my work, and I think that approach is good.
data = [from optimized io routine]
table.data_source = data
Or, I'd like to see examples using parallel libraries such as multiprocessing, joblib, dask, ray, etc. I've tried a few things, and here's what happened: I tried the following in a function that I pass as an argument to joblib's parallel(), but only the row is generated and nothing is displayed inside the actual cell.
with table_row():
[list comprehension]
Describe alternatives you've considered None
Additional context None
64 columns by 1000 rows is at least 64,000 widgets. DPG simply wasn't designed for such volumes of data.
While I agree that things start getting pretty slow in DPG when it gets to thousands of widgets, and there's space for optimization, I'd also like to ask you a question - do your users really, really need to see 64,000 cells in the table? Won't they want to filter it somehow, or to search? Would it make sense to only show a part of that data? Maybe even load it dynamically as the user scrolls, if you really need to display 1,000 rows...
BTW you can't use multiprocessing (either directly or via joblib) because DPG in other processes won't have its internal structures. Moreover, DPG is designed in a way that only one thread at a time can work with the widgets tree.
It would be nice to have an example (demo.py) that shows only a portion of the data in the table, and one that dynamically shows the earlier and later data naturally as you scroll up and down.
I knew it wasn't thread safe because I randomly got a no container to pop error while testing the joblib parallel code. Regarding the table, it looks like something structural needs to be fixed. Pipelined processing of data on top of a python-based GUI seems to have great commercial potential.
Regarding that "no container to pop" error, let me quote my own explanation I gave on Discord:
IMPORTANT: When calling DPG from multiple threads, keep in mind certain parts of DPG keep an internal state - and this is critical to the stability of your code! When you use something like this:
with dpg.child_window() as container: with dpg.table(header_row=False) as table: dpg.add_table_column()
there's an internal container stack that the
with
blocks affect. If another thread takes over somewhere in the middle, and starts adding widgets, it may well add them to your current container. This will break everything. Always, always enclose such pieces withwith dpg.mutex():
(or another mutex if you will). Another caveat is related todpg.last_item()
,dpg.last_container()
, etc. - if you don't use a mutex, they may give you an item added from a neighbour thread.