PandasGUI icon indicating copy to clipboard operation
PandasGUI copied to clipboard

Grapher redesign

Open adamerose opened this issue 4 years ago • 11 comments

Tracking major changes to the Grapher here, will edit this OP as things change.

How the Grapher works

  • There are functions that generate Plotly figures in pandasgui.jotly
  • The Grapher widget is defined in pandasgui.grapher and imports all those Jotly functions
  • The Grapher has a list of 'schemas' including an icon, name, and the Jotly function for each type of graph you can make in the Grapher, and has the UI to switch between those and display figures.
  • The actual drag-n-drop UI inside the Grapher is a FuncUi defined in pandasgui.widgets.func_ui and it takes a list of schemas then auto-generates the UI based on the type-hints of the function provided in the schema.

Here's how FuncUi maps the Jotly function type hints to PyQt widgets:

  • ColumnName (which is just a str) -> A textbox that you can type in or drop columns onto
  • Literal -> A combobox dropdown with options for the valid values defined in the type hint
  • bool -> A checkbox
  • ColumnNameList (which is List(str)) -> A list of multiple text boxes where you can drop column names and add/remove rows.

Todo

  • [X] Added a console to display errors generating plots
  • [x] Added a button Preview Kwargs to show your args as defined by the UI in a text box as a dict
  • [x] Automatically re-render the plot any time the arguments are changed
  • [ ] Re-add the old logic like sorting and aggregating for line/bar plots and more args to jotly
  • [x] Implement ColumnNameList for args that accept a list of column names
  • [ ] Heuristics to differentiate categorical and continuous numeric columns. Should add a color_mode arg like this, but add a third auto option that uses a heuristic.
  • [ ] Sync selections made between the Grapher and DataFrameExplorer. JMP does this and it's very cool: https://www.screencast.com/t/yKmLTaFaP9
  • [ ] Add argument facet_tab: interactive tabs you can click between
  • [ ] Add argument facet_wrap: splits into subplots like facet_row and facet_col but using a single variable and wraps them all into a square grid. This should be easy because plotly already provides facet_col_wrap, I will just make that number auto-calculated to give a grid.
  • [ ] Add argument facet_page: like facet_col but each subplot is full size and you can scroll through them. These were partly inspired by JMP. (page / wrap)
  • [ ] Allow opening Grapher to a specific state. So maybe you could type x = pandasgui.show_grapher(df, type='scatter', args={ 'x': 'age', 'y': 'fare', 'color':'survived''}) and it will pop open like this
  • [ ] Make Grapher a dialog instead of a Tab so you can have multiple Graphers open at once for a single DataFrame
  • [ ] Add checkbox to disable automatic re-rendering
  • [x] Fix the GroupBox title styling
  • [x] Automatic title generation
  • [x] Default Grapher settings stored in preferences

adamerose avatar Mar 18 '21 05:03 adamerose

The auto doesn't detect if you have webgl support or not. It just automatically sets it above 1000 points. So, on corporate laptops that are locked down, or inside virtual machines, the moment you try to display > 1000 points, you get a webgl error. Setting render_mode to svg allows rendering all plots, even if they are > 1000 points, albeit with reduced performance. It seems up to 50,000 points displayed, it's fine on svg engine.

As to the rest, got my head spinning with all the changes. I'll have to digest this some.

fdion avatar Mar 18 '21 20:03 fdion

@adamerose need to know when you think things will settle down on the major refactoring so I can address the automated title/render mode if it is still a regression. I'd like to get to the point where a new pypi release can be made so it'll have the grapher splitter change .

And more generally, when it makes sense to address some of the other things, as I was waiting for the dust to settle a bit! :)

fdion avatar Mar 27 '21 12:03 fdion

One way to handle all plotly settings would be to allow a kwarg dict to show/pandasgui: pass them all the way to jotly if they come from the initial call. Internally, you can handle that with either method you propose. Once in jotly, globally for all plots, any kwargs that start with layout_ are applied to update_layout, anything else to update_trace. And like you said, totally skip the UI.

This would be extremely flexible. New option in plotly? no problem, already supported.

fdion avatar Mar 28 '21 02:03 fdion

Pushed my changes so far. Things are mostly working and I put back in the title generation but it needs to be fixed up since I renamed apply_mean to aggregation and removed apply_sort (I think it's not needed). Render mode is no longer an option inside the Grapher it's just an option in Preferences and automatically applies to the Grapher where needed.

I deleted my previous comment since I changed my mind on each point I gave 😅

adamerose avatar Mar 29 '21 06:03 adamerose

Just had to remove apply_sort from settings and rename apply_mean to aggregation and pass 'none' instead of False, and it started up.

Some feedback:

  • drag and drop to plot setup boxes seems easier to do than to drop on trees in the previous dragger setup
  • haven't dropped a variable outside the designated spots (would happen randomly with the tree setup)
  • nice touch adding the marginal, cumulative and trendline as UI options
  • drag and drop multiple variables on x or y etc is definitely needed, and it is on the TODO (ColumnNameList)
  • i see you don't have to click finish before it generates a plot, for small data sets not a problem, but for large (> 200,000 data points) it slows down the process a good bit. I see you have that also in the TODO (auto render checkbox)
  • x and y are inverted in the dragger
  • stuff like text, markers, hover_name etc missing.
  • still would need apply_sort for some sequential experiments data. Another approach: if apply_sort was instead sort and would take a column name(s) and sort by that?
  • I haven't had the chance to fully test the automated title
  • code export is completely missing

fdion avatar Mar 31 '21 16:03 fdion

In case you are wondering, number 1 hurdle at this time is:

  • Implement ColumnNameList for args that accept a list of column names

fdion avatar Apr 12 '21 16:04 fdion

Yeah still haven't had time for that, will probably do it by this weekend

adamerose avatar Apr 14 '21 15:04 adamerose

Done. Only enabled it on Splom (scatter_matrix) and Word Cloud so far.

image

It'll probably take months to finish all the ideas in this thread - can you give a shortlist of what else you think needs to be done before putting out another release makes sense? I'm never in a hurry to put out new releases but I know you mentioned it above and can prioritize some things

adamerose avatar Apr 20 '21 06:04 adamerose

Hey that's great. Let me look at this, I'll let you know.

fdion avatar Apr 20 '21 19:04 fdion

I ended up packaging a wheel file with the pre-redesign code but with graph splitter, to buy some time, so not super urgent doing a release ATM.

I've been looking into the ColumnNameList, and not having much success getting it to work for X and Y on line. Looked at Wordcloud and SPLOM and it should just work, but get a NoneType error on set_names (func_ui).

fdion avatar May 03 '21 20:05 fdion

Another PR for this ticket: https://github.com/adamerose/PandasGUI/pull/137

fdion avatar May 17 '21 18:05 fdion