cytomapper icon indicating copy to clipboard operation
cytomapper copied to clipboard

Make image handling more efficient

Open nilseling opened this issue 3 years ago • 11 comments

Somehow use raster images to write .h5 files - basically avoiding loading images into memory

nilseling avatar Feb 04 '22 14:02 nilseling

Add browser functionality as used in EBImage

nilseling avatar Feb 09 '22 15:02 nilseling

Check out vitessceR for integration once it's released

nilseling avatar Jul 12 '22 07:07 nilseling

@nilseling I wanted to thank you and everyone who put in the work into cytomapper.

I am working on analyzing IMC data. I am stuck on the interactive "gating" that is done through the shiny app.

I have about 8 images with 20-30 channels each. Originally, I tried the whole pipeline, but I was stuck on this step on my personal 128GB RAM and 16 Core personal Ubuntu server. I was able to use the shiny app for trying out gating, but it was slow and was taking some time to load after each change was made.

I thought that I maybe needed more RAM/memory and cores/threads, therefore I have now brought over the analysis to my institution's HPC cluster.

Now, I am using RStudio Server Open Source on an HPC. I tried the gating with about 260GB RAM/memory and about 26 threads (I could allocate more computational resources as well) but it was still the same (slow).

After some googling around, I found that in order to make shiny faster (or I guess more parallel?) I would have to find a way to pay $$ for RStudio Connect (which gives the capability to spawn multiple processes per app - rather than the 1 process limit per app on the free shiny app) or request it from my institution, which I am trying to avoid...

-- @nilseling Have you experienced the same thing? If so, what is your work-around (without, of course, paying for a RStudio Connection subscription)?

Pancreas-Pratik avatar Jul 17 '22 01:07 Pancreas-Pratik

Nevermind about my question... I guess this makes sense, since the images object in the R environment that is loaded into the shiny is ~11Gb for me. I read through the documentation (?cytomapperShiny) and found a subtle, but very helpful, clue on what to do (to use only masks not the images)

Now, I only have spe and masks loaded (which are much smaller [<1Gb]) and running the shiny using the masks through, I guess internally, plotCells() rather than plotPixels(), is much smoother and very fast.

Thank you again, sorry for any disruption!

Pancreas-Pratik avatar Jul 17 '22 02:07 Pancreas-Pratik

Hi @Pancreas-Pratik, thanks for the detailed comments. Indeed, image handling is not so straight forward in R. I'd like to understand the setting a bit better as it will also help other users. When you are saying that the images are 11GB in memory it shouldn't be a problem to load them into memory on your laptop. What are the dimensions of the images? I believe the issue comes from drawing the composites on the R graphics device when there are a lot of pixels to display. Unfortunately, displaying the masks should actually be slower due to internal subsetting operations.

nilseling avatar Jul 18 '22 07:07 nilseling

Hi @nilseling, You're welcome! I really appreciate your prompt response.

The dimensions are 2372 pixels x 1947 pixels Just curious when you guys did your studies, were the dimensions the same?

@nilseling I think you are very right. I think that was the main issue I was having, the images themselves were taking alot of time to load on the R graphics device. Is there a solution to this?

Regardless, since yesterday or so, I have been using only the masks, so something like this:

if (interactive()) {
    library(cytomapper)
   # images <- readRDS("data/images.rds")
   spe <- readRDS("data/spe.rds")
    masks <- readRDS("data/masks.rds")
    cytomapperShiny(object = spe, mask = masks, 
                    cell_id = "ObjectNumber", img_id = "sample_id")
}

and the gating experience was very smooth (even though you mention the internal subsetting operations would be, I guess, slower?). I could probably try using the images.rds in the cytomapperShiny() but just avoid going to that second tab altogether where the composites are drawn on the R graphics device. I am sort of on a time crunch right now, so using the masks alone like above are working, so just going to use it that way for now. I am so happy I got through the roadblock.

@nilseling you and your team are amazing!

P.S. I have another question, but it's different, so I'll open a new issue in a "just a heartbeat".

Pancreas-Pratik avatar Jul 19 '22 23:07 Pancreas-Pratik

But I do want to mention that I could imagine... when doing 3D IMC... doing the gating for every "2D slice" of the "3D block" could...take some time!

Pancreas-Pratik avatar Jul 19 '22 23:07 Pancreas-Pratik

Yes, these images are quite large and plotting them is the limiting factor. The fastest way of gating would be to not load any masks or images and run cytomapperShiny only on the spe object. But then you won't be able the observe the spatial distribution of cells.

nilseling avatar Jul 20 '22 08:07 nilseling

Thank you @nilseling

You are very correct again that plotting the images (drawing the composites on the R graphics device) is the limiting factor. I have an additional question: Is there any way to speed up drawing the composites on the R graphics device in shiny. Would running on a GPU-enabled node on my HPC cluster help? Do you think there is any code within cytomapper I could modify to speed this up?

EDIT: It's good the way it is on second thought. I ended up loading two instances of RStudio Server and gate two cell-types at the same time. (While one is drawing the composite on it's R graphics device, I do the other) This works! 👍

Pancreas-Pratik avatar Aug 11 '22 01:08 Pancreas-Pratik

Also, I now understand that using the spe and masks only (without images) or spe only (without images or masks) for interactive gating using cytomapperShiny will not serve my purpose for accurately categorizing individual segmented cells in my images into their respective cell types for quantification/spatial analysis/etc. (It took some time and energy to realize this!)

@nilseling mentioned this here https://github.com/BodenmillerGroup/cytomapper/issues/58#issuecomment-1189956208, but I did not completely understand at the time. To save someone else time in the future, my conclusion, although obvious in hindsight, is that the images are required to see the spatial distribution of cells on the image, and also, very importantly, to determine if actual cell signal channel/marker signal is being detected or if it is very low/noise (false-positive) that is being detected visualizing the gating on the images after each adjustment of gating is vital to knowing if gates and cell types are being assigned to their appropriate categories.

I made the mistake of gating all signal on my asinh-transformed counts greater than 0 for positive channel/marker selection and all cells with 0 signal for negative channel/marker selection... I should have adjusted the gating so that it wasn't just "black and white" (positive and negative expression), but there gradients to expression/signal such as zero, low, medium, high, etc... where "high" signal may be where the cell type of interest is, and maybe low and zero are noise (false-positives) for that particular channel/marker.

Here is an example where I selected too many cells (and alot of noise/false-positives): ex-too-much

Here is where I have adjusted it to now, which was better: ex-better

Pancreas-Pratik avatar Aug 11 '22 01:08 Pancreas-Pratik

The imager::display function could be a potential solution

nilseling avatar Sep 21 '22 05:09 nilseling