PandasGUI icon indicating copy to clipboard operation
PandasGUI copied to clipboard

Support to add new feature using python

Open pipdax opened this issue 5 years ago • 6 comments

When I try to EDA, I usually create some new features to help me to analysis data. For example, some columns has missing data, creating a new columns using 1 or 0 to identify that columns has missing or not. I don't want to turn off the pandasgui, creating new column, show pandasgui again. This is complex

pipdax avatar Nov 27 '20 06:11 pipdax

So if I understand, the problem is that you don't like needing to repeatedly call show to re-open the GUI every time you make changes to the original dataframe? There was some discussion of having modications you make in iPython also apply to the DataFrame in the GUI, but I decided against that and explain why here https://github.com/adamerose/pandasgui/issues/20#issuecomment-683378974

Do you have any specific solution in mind? All I can think of is maybe add a method to add or replace dataframes in an existing GUI window, so you would do gui = show(df), then modify the df in iPython, then gui.show(df) and it would overwrite it in the GUI

adamerose avatar Nov 28 '20 04:11 adamerose

Not sure if this is a good idea but here is a rough sketch of a proposal:

Add an option in the GUI to add a new column by specifying

  1. the name of the new column, and
  2. an expression that evaluates to the values of the new column (on the original dataframe)

Basically do something like this:

def add_new_column(self, new_column_name, new_column_expression):
    self.dataframe_original[new_column_name] = eval(new_column_expression, globals(), {self.name:self.dataframe_original})
    self.apply_filters_and_sorting()
    self.update()

Though I'm not sure if this still makes sense when filters and sorts are applied. Personally, this behavior still makes sense to me, but maybe there are people who would disagree, and expect the new column to only contain values in the filtered rows?

JinchengWang avatar Dec 07 '20 10:12 JinchengWang

Here's an API that might work, it just lets you modify your DataFrame as you normally would and then replace the one in the GUI with your result. I can't think of any limitations with this and it seems easier to work with then the add_new_column proposal

gui.replace("my_df_name", my_new_df)

So an example usage would be like this

from pandasgui import show
from pandasgui.datasets import pokemon

gui = show(pokemon)
gui.replace('pokemon', pokemon[pokemon.HP > 100])

Another thing I can do is use my scope sniffing magic (the same thing that get's the dataframe variable name into the GUI as a string) to find all dataframes in your scope and replace the GUI instances with those, you just need to keep the name the same.

from pandasgui import show
from pandasgui.datasets import pokemon

gui = show(pokemon)
pokemon = pokemon[pokemon.HP > 100]
gui.update_all()  # This would find a variable in your scope named 'pokemon' that is a DataFrame, and then replace the one in the GUI with the same name

adamerose avatar Dec 08 '20 16:12 adamerose

I'm worried that if this is a recommended workflow, it would not be compatible with editing data in the GUI, since the changes made in the GUI is not synced with the original dataframe.

For example, if I run

gui = show(pokemon)

then change the HP of Bulbasaur to 200 in the GUI, I would expect Bulbasaur to show up after executing

pokemon = pokemon[pokemon.HP > 100]
gui.update_all() 

JinchengWang avatar Dec 09 '20 03:12 JinchengWang

I'm worried that if this is a recommended workflow, it would not be compatible with editing data in the GUI, since the changes made in the GUI is not synced with the original dataframe.

For example, if I run

gui = show(pokemon)

then change the HP of Bulbasaur to 200 in the GUI, I would expect Bulbasaur to show up after executing

pokemon = pokemon[pokemon.HP > 100]
gui.update_all() 

Yeah you'll always need a method call to sync in either direction, because I don't want to automatically overwrite the original DataFrame due to reasons in the thread I linked. So have .get_dataframes() to get your GUI changes back into iPython and my proposed .replace() and .update_all() to get your iPython changes back into an existing GUI.

Your example would look like this

gui = show(pokemon)
# then change the HP of Bulbasaur to 200 in the GUI
pokemon = gui.get_dataframes()['pokemon']
pokemon = pokemon[pokemon.HP > 100]
gui.update_all() 

This is the least verbose API I can think of

adamerose avatar Dec 10 '20 17:12 adamerose

Your example would look like this

gui = show(pokemon)
# then change the HP of Bulbasaur to 200 in the GUI
pokemon = gui.get_dataframes()['pokemon']
pokemon = pokemon[pokemon.HP > 100]
gui.update_all() 

This is the least verbose API I can think of

Just thought of another idea:

Provide an IPython magic command to wrap this together. For the example above, allow the user to instead do something like

gui = show(pokemon)
# then change the HP of Bulbasaur to 200 in the GUI
%pdgui pokemon = pokemon[pokemon.HP > 100]

This could also make the history-tracking better, since both GUI operations and magic commands can be recorded.

JinchengWang avatar Dec 11 '20 02:12 JinchengWang