starfish
starfish copied to clipboard
Add documentation that describes expected speed of computation for various starfish.data sets
An user should leverage parallelism on HPC clusters to run starfish and process volumetric data like seqFISH data (s3://spacetx.starfish.data.public/seqfish/). I am wondering if it is possible roughly estimate CPU/GPU numbers and hours, memory and associated storage for huge data like the example above.
Hi @pdichiaro this work is on the starfish roadmap but not scheduled for completion until after some core algorithmic improvements. Could you tell us a bit more about why you need these benchmarks? It will help us to prioritize the work.
Thanks!
Hi @ambrosejcarr . I tried to process seqFISH data using 30 Cores 100 GB RAM but fov.get_image does not load into memory the imagestack (my machine keeps crashing with a bus error). I think my computational configuration is not sufficient to process so huge data. I would like to know which computational requirements I should have.
Hi @pdichiaro -- Starfish's memory use will depend on the size of data that you elect to concurrently process. To help us debug, could you let us know
- The shape of the data you were concurrently processing
- The pipeline you were using to process the data (so we can try to determine if there are intermediates that are being unintentionally retained across stages)
- The stage that the pipeline errors at (if you know)
On Fri, Sep 20, 2019 at 11:34 AM pdichiaro [email protected] wrote:
Hi @ambrosejcarr https://github.com/ambrosejcarr . I tried to process seqFISH data using 30 Cores 100 GB RAM but fov.get_image does not load into memory the imagestack (my machine keeps crashing with a bus error). I think my computational configuration is not sufficient to process so huge data. I would like to know which computational requirements I should have.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/spacetx/starfish/issues/1569?email_source=notifications&email_token=ABH7C4GIS4S4MWV3UHAOCUDQKTUQNA5CNFSM4IYTPI62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7HCKBY#issuecomment-533603591, or mute the thread https://github.com/notifications/unsubscribe-auth/ABH7C4D7QRMTKMXA6Q2J5VLQKTUQNANCNFSM4IYTPI6Q .
I tried to use your example seqFISH data and your pipeline.
Thanks. I'll run the current notebook and see if I can reproduce.