h5pyd icon indicating copy to clipboard operation
h5pyd copied to clipboard

Dask tests

Open rsignell-usgs opened this issue 7 years ago • 4 comments

As requested here by @mrocklin: https://github.com/pangeo-data/pangeo/issues/75#issuecomment-357734564

  • [ ] Try XArray + Dask locally on the HSDS data to verify that it can be accessed concurrently from multiple threads
  • [ ] Try XArray + Dask.distributed locally on the HSDS data to verify that the h5pyd objects can survive being serialized
  • [ ] Try everything on a distributed cluster using KubeCluster and then look at the performance of scalable computing
  • [ ] Try this all again on a cluster on S3, where presumably we would expect 100-200MB/s network access from each node.

rsignell-usgs avatar Jan 15 '18 20:01 rsignell-usgs

@jreadey , do you think you might be able to take a stab at these?

rsignell-usgs avatar Jan 15 '18 20:01 rsignell-usgs

@rsignell-usgs - yes I think so, but may need to do a bit of self-education on Dask.

What is KubeCluster?

jreadey avatar Jan 16 '18 04:01 jreadey

KubeCluster is what we use on pangeo.pydata.org (which you might consider trying out) to launch dask on kubernetes. It comes from https://github.com/yuvipanda/daskernetes

On Mon, Jan 15, 2018 at 11:51 PM, John Readey [email protected] wrote:

@rsignell-usgs https://github.com/rsignell-usgs - yes I think so, but may need to do a bit of self-education on Dask.

What is KubeCluster?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HDFGroup/h5pyd/issues/51#issuecomment-357853229, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszHCNRBK4-8m7SGTVe4FXK5KhbVNCks5tLCrBgaJpZM4Re7DA .

mrocklin avatar Jan 16 '18 12:01 mrocklin

Note that we're currently deploying from this branch: https://github.com/yuvipanda/daskernetes/pull/22

On Tue, Jan 16, 2018 at 9:00 AM, Rich Signell [email protected] wrote:

Looks like KubeCluster is part of daskernetes: https://github.com/yuvipanda/daskernetes/blob/master/ daskernetes/tests/test_core.py#L6

This is used in the pangeo xarray-data.ipynb example, so for me, it's http://pangeo.pydata.org/user/rsignell-usgs/notebooks/ examples/xarray-data.ipynb (I can't find the github repo for this notebook!)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HDFGroup/h5pyd/issues/51#issuecomment-357968785, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszLQPyVhZymY0yV5r4a_XGGvJde4Sks5tLKttgaJpZM4Re7DA .

mrocklin avatar Jan 16 '18 14:01 mrocklin