pandas
pandas copied to clipboard
DataFrame.plot doesn't handle axis titles for sharex=row correctly
Code Sample, a copy-pastable example if possible
import matplotlib.pyplot as plt
import pandas as pd
_, axs = plt.subplots(2, 3, sharex='row')
for ax in axs.flatten():
ax.plot(range(5))
ax.set_xlabel('x-axis title')
ax = axs[1, 0]
data = [2]*5
pd.DataFrame(data).plot(ax=ax) # removes xaxis and yaxis labels except those on the sides of the grid.
plt.tight_layout() # so xaxis label can't hide beneath second plot
Problem description
When sharing axes on subplots only by row or column, the Pandas Dataframe plotting method eliminates all axis titles that are not on the far left or bottom right. This gets especially confusing since doing just one Pandas plot screws up the entire subplot grid. In contrast, the Matplotlib (Axes.plot) and Pyplot (plt.plot) plotting methods don't try to be smart with axis labels at all and leave them where they are.
The problem occurs for all four combinations of sharex, sharey, 'row' and 'col'.
A workaround is to iterate through the axes and set ax.xaxis.label.set_visible(True) which has to be done after the last Pandas operation on any Axes contained in the Figure.
Expected Output
Three options:
-
Do not modify axis labels to maintain consistency with
matplotlib.axes.Axes.plot()andmatplotlib.pyplot.plot(). I. e. keep axis labels on all plots. -
Only remove x-axis labels on axes shared with an axes beneath. Only remove y-axis labels on axes shared with an axes to the left. a) Do this for all axes in the subplot. b) Only modify axis labels on axes that pandas actually plots to. (Sounds like a nightmare to implement.) c) Include a keyword argument in
pandas.Dataframe.plotthat allows the user to control whether or not axis labels are modified.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.14.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None
pandas: 0.23.0 pytest: 3.5.0 pip: 9.0.3 setuptools: 39.0.1 Cython: 0.28.2 numpy: 1.14.2 scipy: 1.0.1 pyarrow: None xarray: None IPython: 5.6.0 sphinx: 1.7.2 patsy: 0.5.0 dateutil: 2.7.2 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: None matplotlib: 2.2.2 openpyxl: 2.5.2 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.4 lxml: 4.2.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.6 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
An earlier change to this feature didn't take into account that 'row' and 'col' can be passed for sharex and sharey. https://github.com/pandas-dev/pandas/issues/9737
IMO, option 1 seems best to me.
cc @TomAugspurger
Specifically, it seems to work to loop through the axes after any other operations on the axes have finished:
# assuming 3 columns and 6 rows
for c in range(0,6):
ax[c, 0].tick_params(axis='both', which='both', labelsize=7, labelbottom=True)
ax[c, 1].tick_params(axis='both', which='both', labelsize=7, labelbottom=True)
ax[c, 2].tick_params(axis='both', which='both', labelsize=7, labelbottom=True)
ax[c, 0].xaxis.label.set_visible(True)
ax[c, 1].xaxis.label.set_visible(True)
ax[c, 2].xaxis.label.set_visible(True)