sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

Tests should use process-based dask

Open benjeffery opened this issue 2 years ago • 3 comments

#1043 shows that we should test with a processed-based dask cluster.

I've tried this by adding client = dask.distributed.Client(n_workers=1, threads_per_worker=1) to conftest.py but I get segfaults in workers. Attaching GDB to the workers shows that the segfaults are in several numba gufuncs such as count_alleles and cohort_sum. Deleting the __pycache__ will sometimes stop a particular test from failing, which is disconcerting!

benjeffery avatar Mar 09 '23 01:03 benjeffery

Possibly related to #869? Do you get segfaults if you disable numba caching?

timothymillar avatar Mar 09 '23 05:03 timothymillar

Ah, yes this is! Thanks for the pointer.

benjeffery avatar Mar 09 '23 13:03 benjeffery

I think it's important that we test on both the default dask threads-within-process and an explicit scheduler. Is there something we can do with pytest to run the full test suite under both conditions?

jeromekelleher avatar Dec 14 '23 12:12 jeromekelleher