Support to add new feature using python
When I try to EDA, I usually create some new features to help me to analysis data. For example, some columns has missing data, creating a new columns using 1 or 0 to identify that columns has missing or not. I don't want to turn off the pandasgui, creating new column, show pandasgui again. This is complex
So if I understand, the problem is that you don't like needing to repeatedly call show to re-open the GUI every time you make changes to the original dataframe? There was some discussion of having modications you make in iPython also apply to the DataFrame in the GUI, but I decided against that and explain why here https://github.com/adamerose/pandasgui/issues/20#issuecomment-683378974
Do you have any specific solution in mind? All I can think of is maybe add a method to add or replace dataframes in an existing GUI window, so you would do gui = show(df), then modify the df in iPython, then gui.show(df) and it would overwrite it in the GUI
Not sure if this is a good idea but here is a rough sketch of a proposal:
Add an option in the GUI to add a new column by specifying
- the name of the new column, and
- an expression that evaluates to the values of the new column (on the original dataframe)
Basically do something like this:
def add_new_column(self, new_column_name, new_column_expression):
self.dataframe_original[new_column_name] = eval(new_column_expression, globals(), {self.name:self.dataframe_original})
self.apply_filters_and_sorting()
self.update()
Though I'm not sure if this still makes sense when filters and sorts are applied. Personally, this behavior still makes sense to me, but maybe there are people who would disagree, and expect the new column to only contain values in the filtered rows?
Here's an API that might work, it just lets you modify your DataFrame as you normally would and then replace the one in the GUI with your result. I can't think of any limitations with this and it seems easier to work with then the add_new_column proposal
gui.replace("my_df_name", my_new_df)
So an example usage would be like this
from pandasgui import show
from pandasgui.datasets import pokemon
gui = show(pokemon)
gui.replace('pokemon', pokemon[pokemon.HP > 100])
Another thing I can do is use my scope sniffing magic (the same thing that get's the dataframe variable name into the GUI as a string) to find all dataframes in your scope and replace the GUI instances with those, you just need to keep the name the same.
from pandasgui import show
from pandasgui.datasets import pokemon
gui = show(pokemon)
pokemon = pokemon[pokemon.HP > 100]
gui.update_all() # This would find a variable in your scope named 'pokemon' that is a DataFrame, and then replace the one in the GUI with the same name
I'm worried that if this is a recommended workflow, it would not be compatible with editing data in the GUI, since the changes made in the GUI is not synced with the original dataframe.
For example, if I run
gui = show(pokemon)
then change the HP of Bulbasaur to 200 in the GUI, I would expect Bulbasaur to show up after executing
pokemon = pokemon[pokemon.HP > 100]
gui.update_all()
I'm worried that if this is a recommended workflow, it would not be compatible with editing data in the GUI, since the changes made in the GUI is not synced with the original dataframe.
For example, if I run
gui = show(pokemon)then change the HP of Bulbasaur to 200 in the GUI, I would expect Bulbasaur to show up after executing
pokemon = pokemon[pokemon.HP > 100] gui.update_all()
Yeah you'll always need a method call to sync in either direction, because I don't want to automatically overwrite the original DataFrame due to reasons in the thread I linked. So have .get_dataframes() to get your GUI changes back into iPython and my proposed .replace() and .update_all() to get your iPython changes back into an existing GUI.
Your example would look like this
gui = show(pokemon)
# then change the HP of Bulbasaur to 200 in the GUI
pokemon = gui.get_dataframes()['pokemon']
pokemon = pokemon[pokemon.HP > 100]
gui.update_all()
This is the least verbose API I can think of
Your example would look like this
gui = show(pokemon) # then change the HP of Bulbasaur to 200 in the GUI pokemon = gui.get_dataframes()['pokemon'] pokemon = pokemon[pokemon.HP > 100] gui.update_all()This is the least verbose API I can think of
Just thought of another idea:
Provide an IPython magic command to wrap this together. For the example above, allow the user to instead do something like
gui = show(pokemon)
# then change the HP of Bulbasaur to 200 in the GUI
%pdgui pokemon = pokemon[pokemon.HP > 100]
This could also make the history-tracking better, since both GUI operations and magic commands can be recorded.