Spectre
Spectre copied to clipboard
Advice for automated cell classification workflow between datasets with different number of parameters
Hi there,
Thanks again for the tool.
I have 1 FCS that was acquired by the Aurora and 1 FCS that was acquired by Fortessa with much less number of markers. I aim to use the Aurora dataset to annotate the Fortessa dataset.
All markers in Fortessa are included in the Aurora.
My goal is to see whether a subset of my cells detected in the Aurora can be mapped onto any of the cells in the Fortessa data.
Q1. Should I first remove markers that are not present in the Fortessa for this workflow (https://immunedynamics.io/spectre/Automated-classification.html)?
Q2. If the Aurora was analysed by Catalyst and not Spectre, is it advisable to reanalyse the Aurora first with Spectre, or perhaps, convert the output somehow?
Thank you.
Q1: Yes I would remove markers that are not present in the Fortessa. The markers have to exist in both datasets for the classification to work.
Q2: I don't think it matters what tools you used to analyse the Aurora data. Just convert the SCE object into a data.table (https://immunedynamics.io/spectre/tutorials/datatable_interoperability/sce_support.html) object, and run the workflow.
Before you run the classification workflow, please integrate ("batch correct") the Fortessa and the Aurora datasets first as these are acquired by different instruments and will most likely have batch effect on them. You can use Harmony to do the alignment (https://github.com/ImmuneDynamics/Spectre/blob/master/R/run.harmony.R).
@denvercal1234GitHub you might like to use run.rpca instead -- Harmony will integrated the data in a new embedding, but not correct the expression values, whereas rPCA will. Have a look at the workflow in the v1.2 beta: https://github.com/ImmuneDynamics/Spectre/tree/v1.2.0-beta/workflows/Spectre%20rPCA.
Thank you @ghar1821 and @tomashhurst !
I really like the batch correction workflow tutorial you have here at https://immunedynamics.io/spectre/cytonorm/. So, you think the https://github.com/ImmuneDynamics/Spectre/tree/v1.2.0-beta/workflows/Spectre%20rPCA is better integration approach as it directly correct the expression values? Is there a tutorial written up for it?
Also, it would be very helpful to have more tools to identify which marker contributing to a batch and whether a batch correction is needed?
Thank you again so much for your support and the tool!
We will be making a big update to Spectre soon that will include some more tutorial on batch correction. So stay tuned!
Thank you @ghar1821 ! While we are on the topic, as mentioned above, the current batch alignment workflow employing CytoNorm is great. I wonder if you guys could also consider adding in ability to interface with cyCombine package for batch alignment (https://github.com/biosurf/cyCombine)? There will always be newer package for tasks but I thought a lot of folks would find it very useful to have Spectre adaptable to allow usage of different tools as they come? Thank you again.
There will be functions to run cycombine and other batch alignment tools in the new update ;)
Hi @ghar1821 - Thanks again for all your help thus far. I wanted to check in to see whether you guys might have had a chance to incorporate cyCombine into Spectre?
@denvercal1234GitHub I've implemented cycombine in v2-dev branch (https://github.com/ImmuneDynamics/Spectre/blob/v2-dev/R/run.cycombine.R).
I've only tested it on one toy dataset and it seems to be working. So feel free to give it a try and let us know if you run into trouble.
Make sure you install the package from v2-dev branch to access the function:
remotes::install_github("immunedynamics/spectre", ref='v2-dev')
Hi, @ghar1821 - Thanks for letting me know. I ran into an issue actually when using the channel values (https://immunedynamics.io/spectre/cytometry/) with cyCombine (See the last comments at https://github.com/biosurf/cyCombine/issues/58).
In brief, it seems that the detect_batch_effect() EMD calculation from cyCombine expects the values range from 0 to 6 whereas if I use the channel values exported from FlowJo after bi-exponential transformation, the values actually go from 0 to about 1000. Do you have any suggestion?
This is how I performed the transformation with FlowJo:
Step 1. Bi-exponentially transformed the data in FlowJo (because it is visually easier instead of determining per-marker co-factor)
Step 2. Exported the channel values from FlowJo as csv
Step 3. Import these csv into R so that the transformation by FlowJo was preserved, using Spectre::read.files(file.loc = "....exportedChannelValues_Singlets_Live/Full_Stained", file.type = ".csv", do.embed.file.names = TRUE)
Step 4. Exported the data from R as FCS files with the transformation preserved using write.files(data_table_object, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileName")
Step 5. Import back these FCS with the transformation preserved as a flowSet using read.flowSet(path=fcs.dir1, pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE), which is now input for prepare_data()
My output of prepare_data is below. The range of values here goes from 0 through 1017.
Thank you for your help!
My two cents: after step 3, you can try to divide the intensities by 1000 / 6 ~ 200, and probably ,there should be a way to avoid step 4 & 5 as the data_table_object is OK for Spectre.
Thank you, @SamGG, for your suggestions. I tried to divide the values by 200 to now obtain a range of 0-6. Nevertheless, as also mentioned in https://github.com/biosurf/cyCombine/issues/58, it did not seem to solve the potential discrepancy between EMD plot versus what is visualised by density plots for high-EMD markers. I am trying to fix the x-axis of the density plots to see perhaps it was just a matter of visual deception by these histograms.