starfish icon indicating copy to clipboard operation
starfish copied to clipboard

Add documentation that describes expected speed of computation for various starfish.data sets

Open pdichiaro opened this issue 4 years ago • 5 comments

An user should leverage parallelism on HPC clusters to run starfish and process volumetric data like seqFISH data (s3://spacetx.starfish.data.public/seqfish/). I am wondering if it is possible roughly estimate CPU/GPU numbers and hours, memory and associated storage for huge data like the example above.

pdichiaro avatar Sep 20 '19 06:09 pdichiaro

Hi @pdichiaro this work is on the starfish roadmap but not scheduled for completion until after some core algorithmic improvements. Could you tell us a bit more about why you need these benchmarks? It will help us to prioritize the work.

Thanks!

ambrosejcarr avatar Sep 20 '19 14:09 ambrosejcarr

Hi @ambrosejcarr . I tried to process seqFISH data using 30 Cores 100 GB RAM but fov.get_image does not load into memory the imagestack (my machine keeps crashing with a bus error). I think my computational configuration is not sufficient to process so huge data. I would like to know which computational requirements I should have.

pdichiaro avatar Sep 20 '19 15:09 pdichiaro

Hi @pdichiaro -- Starfish's memory use will depend on the size of data that you elect to concurrently process. To help us debug, could you let us know

  1. The shape of the data you were concurrently processing
  2. The pipeline you were using to process the data (so we can try to determine if there are intermediates that are being unintentionally retained across stages)
  3. The stage that the pipeline errors at (if you know)

On Fri, Sep 20, 2019 at 11:34 AM pdichiaro [email protected] wrote:

Hi @ambrosejcarr https://github.com/ambrosejcarr . I tried to process seqFISH data using 30 Cores 100 GB RAM but fov.get_image does not load into memory the imagestack (my machine keeps crashing with a bus error). I think my computational configuration is not sufficient to process so huge data. I would like to know which computational requirements I should have.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/spacetx/starfish/issues/1569?email_source=notifications&email_token=ABH7C4GIS4S4MWV3UHAOCUDQKTUQNA5CNFSM4IYTPI62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7HCKBY#issuecomment-533603591, or mute the thread https://github.com/notifications/unsubscribe-auth/ABH7C4D7QRMTKMXA6Q2J5VLQKTUQNANCNFSM4IYTPI6Q .

ambrosejcarr avatar Sep 20 '19 18:09 ambrosejcarr

I tried to use your example seqFISH data and your pipeline.

pdichiaro avatar Sep 23 '19 21:09 pdichiaro

Thanks. I'll run the current notebook and see if I can reproduce.

ambrosejcarr avatar Sep 27 '19 08:09 ambrosejcarr