seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

`FacetGrid.map_dataframe` passes disallowed keyword arguments to `pointplot`

Open desilinguist opened this issue 2 years ago • 12 comments

My code that used to work perfectly fine with 0.11.0 breaks with the new 0.12.0 release. The code creates a FacetGrid and then applies pointplot() to each of the cells as follows:

...
g = sns.FacetGrid(df_fs, row="metric", col="learner_name",
                  hue="variable", height=2.5, aspect=1,
                  margin_titles=True, despine=True, sharex=False,
                  sharey=False, legend_out=False, palette="Set1")
g = g.map_dataframe(sns.pointplot, "training_set_size", "value",
                    scale=.5, ci=None)
...

The relevant traceback is as follows:

Traceback (most recent call last):
 ...
  File "/builds/EducationalTestingService/skll/skll/experiments/output.py", line 127, in generate_learning_curve_plots
    g = g.map_dataframe(sns.pointplot, "training_set_size", "value",
  File "/root/sklldev/lib/python3.8/site-packages/seaborn/axisgrid.py", line 819, in map_dataframe
    self._facet_plot(func, ax, args, kwargs)
  File "/root/sklldev/lib/python3.8/site-packages/seaborn/axisgrid.py", line 848, in _facet_plot
    func(*plot_args, **plot_kwargs)
TypeError: pointplot() got an unexpected keyword argument 'label'

Looking at the code for FaceGrid.map_dataframe, it does indeed create a label keyword argument which, I guess, causes the failure when pointplot() is called. From reading the release notes, it looks like this is because of this item

Removed the (previously-unused) option to pass additional keyword arguments to pointplot()

That keyword argument is created if hue is specified which I am not sure how to get around since I have multiple variables that I want to represent with different colors. If there's another way to achieve this, I'd really appreciate any guidance.

desilinguist avatar Sep 07 '22 16:09 desilinguist

I tried modifying the code to move the hue variable from the FacetGrid() instantiation call to the map_dataframe() call instead:

...
g = sns.FacetGrid(df_fs, row="metric", col="learner_name", height=2.5, 
                               aspect=1, margin_titles=True, despine=True, sharex=False, 
                               sharey=False, legend_out=False, palette="Set1")
g = g.map_dataframe(sns.pointplot, x="training_set_size", y="value",
                    hue="variable", scale=.5, errorbar=None)
...

While this code does work, it does not produce accurate results. Here's the plot with the above code with v0.12.0:

12

For comparison, here's the plot as produced by the original code with v0.11.2:

11

desilinguist avatar Sep 07 '22 18:09 desilinguist

I figured out how to make this work by following the recommendations that:

  • hue levels and keywords should be handled by the plotting function and not FacetGrid
  • we need to make sure that the variable in the data frame that maps to hue levels is categorical.
  • it is now recommended to explicitly assign palette colors to hue levels.

Here's the new code:

...
df_fs["variable"] = df_fs["variable"].astype("category")
g = sns.FacetGrid(df_fs, row="metric", col="learner_name",
                  height=2.5, aspect=1, margin_titles=True,
                  despine=True, sharex=False,
                  sharey=False, legend_out=False)
g = g.map_dataframe(sns.pointplot, x="training_set_size",
                    y="value", hue="variable", scale=.5,
                    errorbar=None,
                    palette={"train_score_mean": train_color,
                             "test_score_mean": test_color})
...

This code now produces the same (correct) plot as with v0.11.2.

desilinguist avatar Sep 07 '22 20:09 desilinguist

We should unbreak this, even if it's discouraged usage.

Glad you were able to work out the right thing to do here, but I am a little curious why you didn't opt for catplot, which would do all this complicated bookkeeping for you.

mwaskom avatar Sep 07 '22 22:09 mwaskom

Indeed, catplot would have been simpler and I did try it but the marker size seemed to be much larger and less to my liking than if I used FacetGrid.

desilinguist avatar Sep 08 '22 01:09 desilinguist

Do you have an example? Catplot should basically just be generating the code in your third post.

mwaskom avatar Sep 08 '22 02:09 mwaskom

Sure! Attached are two plots that are saved in 300 DPI using plt.savefig(). The first was generated using FacetGrid + pointplot and the second was generated using catplot. I am doing a bunch of matplotlib-level processing to add the plot titles and legend manually but that part is identical between the two scenarios.

factgrid+pointplot

catplot

desilinguist avatar Sep 08 '22 13:09 desilinguist

Thanks but I’d need to see the actual code to make sense of the example.

mwaskom avatar Sep 08 '22 14:09 mwaskom

Ah, sorry. Please take a look at the generate_learning_curve_plots() function here.

desilinguist avatar Sep 08 '22 14:09 desilinguist

That link 404s (is it a private repo?)

I can't reproduce whatever you might be seeing with a simple example though:

image

(The tips example dataframe loads with categorical dtypes so it simplifies the bookkeeping when using FacetGrid).

mwaskom avatar Sep 08 '22 22:09 mwaskom

Apologies, that branch was probably merged by the time you got to it. It's now in the main branch.  

desilinguist avatar Sep 08 '22 22:09 desilinguist

I don't see any use of catplot on that page?

mwaskom avatar Sep 08 '22 23:09 mwaskom

Yeah, as I mentioned, I didn't use catplot in production because of the marker size.

Here's a gist that shows how I combined the FacetGrid and pointplot calls together.

However, I am extremely embarrassed to say that it now works fine 😬! Looking back on it, probably because when I did the test originally, I forgot to include the scale=0.5 keyword in the catplot call.

Apologies for wasting your time on this secondary issue.

desilinguist avatar Sep 08 '22 23:09 desilinguist