starfish Add documentation that describes expected speed of computation for various starfish.data sets

Add documentation that describes expected speed of computation for various starfish.data sets

Open pdichiaro opened this issue 4 years ago • 5 comments

An user should leverage parallelism on HPC clusters to run starfish and process volumetric data like seqFISH data (s3://spacetx.starfish.data.public/seqfish/). I am wondering if it is possible roughly estimate CPU/GPU numbers and hours, memory and associated storage for huge data like the example above.

Sep 20 '19 06:09 pdichiaro

Hi @pdichiaro this work is on the starfish roadmap but not scheduled for completion until after some core algorithmic improvements. Could you tell us a bit more about why you need these benchmarks? It will help us to prioritize the work.

Thanks!

Sep 20 '19 14:09 ambrosejcarr

Hi @ambrosejcarr . I tried to process seqFISH data using 30 Cores 100 GB RAM but fov.get_image does not load into memory the imagestack (my machine keeps crashing with a bus error). I think my computational configuration is not sufficient to process so huge data. I would like to know which computational requirements I should have.

Sep 20 '19 15:09 pdichiaro

Hi @pdichiaro -- Starfish's memory use will depend on the size of data that you elect to concurrently process. To help us debug, could you let us know

The shape of the data you were concurrently processing
The pipeline you were using to process the data (so we can try to determine if there are intermediates that are being unintentionally retained across stages)
The stage that the pipeline errors at (if you know)

On Fri, Sep 20, 2019 at 11:34 AM pdichiaro [email protected] wrote:

Hi @ambrosejcarr https://github.com/ambrosejcarr . I tried to process seqFISH data using 30 Cores 100 GB RAM but fov.get_image does not load into memory the imagestack (my machine keeps crashing with a bus error). I think my computational configuration is not sufficient to process so huge data. I would like to know which computational requirements I should have.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/spacetx/starfish/issues/1569?email_source=notifications&email_token=ABH7C4GIS4S4MWV3UHAOCUDQKTUQNA5CNFSM4IYTPI62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7HCKBY#issuecomment-533603591, or mute the thread https://github.com/notifications/unsubscribe-auth/ABH7C4D7QRMTKMXA6Q2J5VLQKTUQNANCNFSM4IYTPI6Q .

Sep 20 '19 18:09 ambrosejcarr

I tried to use your example seqFISH data and your pipeline.

Sep 23 '19 21:09 pdichiaro

Thanks. I'll run the current notebook and see if I can reproduce.

Sep 27 '19 08:09 ambrosejcarr

starfish starfish copied to clipboard

Add documentation that describes expected speed of computation for various starfish.data sets

starfish
starfish copied to clipboard