pytest-notebook
pytest-notebook copied to clipboard
Compare cell outputs symantically
I recently had to update a test that uses pytest-notebook to validate a table produced by notebook code because pandas 2.0.2 added a slight change to the white-space it produces in jupyter notebook:
While looking into this issue, i noticed that the jupyter notebook includes both a plain-text and an html representation of the cell output:
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Time</th>\n",
" <th>M1 Power Dispatch [W]</th>\n",
" <th>M2 Power Dispatch [W]</th>\n",
" <th>M3 Power Dispatch [W]</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2021-07-26 10:00:00+00:00</td>\n",
" <td>0.0</td>\n",
" <td>-2000.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2021-07-26 11:00:00+00:00</td>\n",
" <td>0.0</td>\n",
" <td>-3000.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2021-07-26 12:00:00+00:00</td>\n",
" <td>-4000.0</td>\n",
" <td>-3000.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2021-07-26 13:00:00+00:00</td>\n",
" <td>-4000.0</td>\n",
" <td>-3000.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Time M1 Power Dispatch [W] M2 Power Dispatch [W] \\\n",
"0 2021-07-26 10:00:00+00:00 0.0 -2000.0 \n",
"1 2021-07-26 11:00:00+00:00 0.0 -3000.0 \n",
"2 2021-07-26 12:00:00+00:00 -4000.0 -3000.0 \n",
"3 2021-07-26 13:00:00+00:00 -4000.0 -3000.0 \n",
"\n",
" M3 Power Dispatch [W] \n",
"0 0.0 \n",
"1 0.0 \n",
"2 0.0 \n",
"3 0.0 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
]
This made me curious -- would it be possible to modify pytest-notebook to load the text/html contents using an xml parser like beautifulsoup and compare them symantically? It seems like it might be a way to avoid false-positives when the 'text/plain' contents change in a way that is not significant, such as changing the column spacing or white-space characters.
Hey @AnjoMan have you seen https://pytest-notebook.readthedocs.io/en/latest/user_guide/tutorial_config.html#format-html-svg-outputs