Is `Client(threads_per_worker=1)` the best way to load a Dask client?
I started seeing Client(threads_per_worker=1) in the recipes.
Is this intentional? Is this best practice? I'm not doubting, dask is a bit of a mystery for me... If this is what we should be doing we should change it everywhere, right?
cc @anton-seaice @angus-g
There is a bug somewhere in the netcdf libraries, that means that using more than one thread when trying to access a netcdf file in parallel (across multiple threads within the same work) fails.
Its a bug, so it should / will get fixed and this will flow through to future conda/analysis versions. Although the issue has been around for more than a year now and not been resolved.
Its a workaround for this issue:
https://forum.access-hive.org.au/t/netcdf-not-a-valid-id-errors/389
Dale said he would pin netcdf in conda/analysis to the previous version that didn't have this bug, but people were having issues last week. Ill ask him on the hive about it :)
Duplicate of #398
I think I closed this incorrectly.
Anyway - there is no resolution in sight for this (its deep in the netcdf-c library!).
So for now, to run on 'conda/analysis3-24.04' or later, we need to set threads_per_worker=1
Duplicate of #398
These are not duplicates. The two issues are different.
Should we add this to all examples until the issue is resolved?
Yes I think we should add Client(threads_per_worker=1) to all recipes. As far as I know this bug hasn't been solved. Let's add to the hackathon and someone can go through and check all recipes have this? Maybe it would be useful to have a note in each that this is needed X reason also, so people know they have to copy/use in their own scripts also?
https://github.com/COSIMA/cosima-recipes/pull/488 added that to all recipes