`gwas_tutorial.ipynb` taking too long to run.
CI Is currently failing (e.g. https://github.com/pystatgen/sgkit/actions/runs/3251065111/jobs/5362259020) as the GWAS tutorial notebook is timing out. (default timeout is 30s, I've been running locally for a 5min and it is still going)
I assume this is a regression? Looking into it (I can't self-assign here yet).
Thanks for opening this @benjeffery - I was just about to open the same issue! This is a regression - started on Friday.
I can reproduce locally and I get the following log:
Traceback (most recent call last):
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 730, in _async_poll_for_reply
msg = await ensure_async(self.kc.shell_channel.get_msg(timeout=new_timeout))
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/util.py", line 96, in ensure_async
result = await obj
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/jupyter_client/channels.py", line 230, in get_msg
raise Empty
_queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/jupyter_cache/executors/utils.py", line 58, in single_nb_execution
executenb(
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 1204, in execute
return NotebookClient(nb=nb, resources=resources, km=km, **kwargs).execute()
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/util.py", line 84, in wrapped
return just_run(coro(*args, **kwargs))
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/util.py", line 62, in just_run
return loop.run_until_complete(coro)
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 663, in async_execute
await self.async_execute_cell(
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 949, in async_execute_cell
exec_reply = await self.task_poll_for_reply
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 754, in _async_poll_for_reply
await self._async_handle_timeout(timeout, cell)
File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 801, in _async_handle_timeout
raise CellTimeoutError.error_from_timeout_and_cell(
nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 30 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
dp = ds.call_DP.where(ds.call_DP >= 0) # filter out missing
sample_dp_mean = dp.mean(dim="variants")
sample_dp_mean.attrs["long_name"] = "Mean Sample DP"
ds["sample_dp_mean"] = sample_dp_mean # add new data array to dataset
ds.plot.scatter(x="sample_dp_mean", y="sample_call_rate", size=8, s=10);
-------------------
Running the notebook manually doesn't cause the problem - that cell runs instantly.
Running the notebook manually doesn't cause the problem - that cell runs instantly.
I'm not finding that! Locally the cell takes several minutes. (I'm on matplotlib==3.6.1 if that makes any difference as it seem to be plotting releated)
Diffing the installed dependencies of the failing build with the last successful shows that this is due to xarray==2022.10.0.
Locally, xarray==2022.9.0 completes the build in 44s. Looking into what changed.
You're right - I was running the wrong cell - I can reproduce it in the notebook now.
From https://github.com/pydata/xarray/releases/tag/v2022.10.0: "This release brings numerous bugfixes, a change in minimum supported versions, and a new scatter plot method for DataArrays."
Yeah, https://github.com/pydata/xarray/pull/6778 completely replaced the scatter code.
Heh, the output (after a while) is completly different!

It's fine to pin xarray on an older version while we address this (if there's no obvious fix) - that would unblock the other issues.
The underlying issue hasn't been fixed (see #1122), so it might be worth reporting upstream @benjeffery?
Can we do the scatter plot with matplotlib or something to avoid the problem?