Dieter Weber

Results 225 comments of Dieter Weber

Looking a bit into Lustre, it doesn't seem to have the Hadoop functionality of keeping data directly on the client node to get massive collective IO bandwidth very cheaply. Lustre...

Essentially, it seems to boil down to two options for good scaling that each have pros and cons. # HDF5, MPI and Lustre ++ Works on regular HPC clusters ++...

In my feeling, the first choice looks like an easier path with a short-term perspective, but we would start running into limits in the medium and long term when we...

An example what work with Apache Spark et al would look like: https://de.slideshare.net/KevinMader/interactive-scientific-image-analysis-using-spark

On a single node, dask is actually doing very well under good conditions: https://github.com/LiberTEM/LiberTEM/issues/14#issuecomment-369186198 The problems in hyperspy/hyperspy#1840 are not a fundamental dask issue. In general, the first step should...

Apache Spark as part of the Hadoop ecosystem looks like the system of choice for analytics and machine learning on very large datasets. From what we can see, it has...

Comparison of an Apache Spark solution with our [requirements](https://github.com/uellue/opixtem/wiki/Requirements): https://github.com/LiberTEM/LiberTEM/projects/1 Only requirements that relate to the data processing and storage are considered for GUI and live acquisition.

@sk1p oh nice! I'd like to see how it plays out in practice in comparison to dask. No mention of futures or data locality, though. Let's see! :-)

@matbryan52 your Apache Ray prototype would fit into this Issue! :-)