MuSiC
MuSiC copied to clipboard
How to build custom single cell dataset
Hi,
I find this method really cool and promising but I am having issues trying to implement it to my data.
Can you provide a vignette (or section of one) describing how to go from expression matrix to the necessary input file for MuSiC? Or perhaps you can construct the single cell reference files from the Tabula muris or MCA datasets?
Hi,
Thanks for reaching out. The single cell data are stored in ExpressionSet. Please see https://www.rdocumentation.org/packages/Biobase/versions/2.32.0/topics/ExpressionSet for details.
Suppose gene_exprs.matrix
is your gene expression matrix (genes by cells), and pheno.matrix
is a data frame of phenotype annotation (rownames must match the column names of gene_exprs.matrix
). Suppose pheno.matrix
have 4 columns: sampleID
, SubjectName
, cellTypeID
, and cellType
.
metadata <- data.frame(labelDescription= c("Sample ID", "Subject Name", "Cell Type ID", "Cell Type Name"), row.names=c("sampleID", "SubjectName", "cellTypeID", "cellType"))
SC.eset = ExpressionSet(assayData = data.matrix(gene_exprs.matrix), phenoData = new("AnnotatedDataFrame", data = pheno.matrix, varMetadata = metadata) )
SC.eset
is the single cell data in form of ExpressionSet.
Thank you very much!!
Hi xuranw, Your method is really cool and I try to implement it to my data. But I met some problems when I set up the ExpressionSet. I used your method, but for the GSE107585, I could find the phenodata, I download your ExpressionSet, and find detailed information on phenodata. How did you get the information on phenodata? could you tell me the information, I tried different methods but failed.
Thank you very much!
I could not find the phenodata.
Hi Jiumeizhu,
Thanks for using MuSiC.
As you mentioned, you are using the data from GSE107585. The phenodata should includes at least subject name and cell type lable for each cell. Have you checked the annotation file for the dataset? Is there any annotation instruction in their Science paper? If not, maybe you should email the authors that is responsible for this dataset.
When you have the annotation for each cell, it is not hard to construct a cell by pheno-feature matrix to feed the phenodata for ExpressionSet.
Hope this helps.
Best, Xuran
Hi Xuran, Thanks for your immediately reply, I down load all the dataset of GSE107585, However, I couldn't find the phenodata also in the annotation file. Because GSE107585 is the kidney scRNA seq data used in your paper, did you get the subject name and cell type for each cell form the author?
Thank you very much! Best wishes, Honglin
发自网易邮箱大师 On 4/5/2019 22:37,xuranw[email protected]mailto:[email protected] wrote:
Hi Jiumeizhu,
Thanks for using MuSiC.
As you mentioned, you are using the data from GSE107585. The phenodata should includes at least subject name and cell type lable for each cell. Have you checked the annotation file for the dataset? Is there any annotation instruction in their Science paper? If not, maybe you should email the authors that is responsible for this dataset.
When you have the annotation for each cell, it is not hard to construct a cell by pheno-feature matrix to feed the phenodata for ExpressionSet.
Hope this helps.
Best, Xuran
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/xuranw/MuSiC/issues/2#issuecomment-480414295, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AuSKWGqEdzRxqU8tBgn-l8kfNLfAyzw6ks5vd7QRgaJpZM4Vco3W.
Do you have a list of steps to follow from start to end for starting with your own single cell data and bulk datasets. I'm trying to figure out how to scale my sc data to my bulk.
Thanks in advance.
Hello Xuran! I tried to build my own single-cell Expression Set from Zeisel dataset (GSE60361), downloaded from scRNAseq R library. I did it and I ran Music using as clusters the cell type and as samples the cell ids.
My question is, it is right to use as samples the cell ids, or should I use the subject id, that is another thing?In the tutorial it seems that you use the sample ids, that in a single cell should be the cell id. In this case how Music finds the information about the subject, in order to calculate the cross-subject consistency?
And also, in my case I do not have information about the subjects (only sex, age, etc) but nothing like a subject id or subject name, that's why I asked if it's okay to use the cell ids.
Thank you very much! Hi, would you explain how did you prepare input files for MuSiC analysis? It is confusing for me. The file explained in the tutorial is not working and I cannot understand what are the input files required for this. @xuranw @cartal
@xuranw
Hello, I am relatively new PhD student that is trying to learn how to use MuSiC. The tutorial is very clear in regards to using the previously processed data. However, I am not understanding how to generate the ExpressionSet objects using my own bulk and single cell datasets. Could you please provide a stepwise explanation of how you prepared your ExpressionSet objects so that I (and others who have requested a similar tutorial) can follow along, the response from July 2018 is too vague for me to understand. I appreciate the time and effort you have put into this project, I hope you have time to help me. Thank you.
Hello xuranw,
I really like your method and want to apply your method with my dataset. In this case, I generated ExpressionSet(exprData) as bulk.mtx and ExpressionSet(scData) as SC.eset. But an error occurred when I run music_prop(), which shows Error in rowMeans(bulk.mtx) : 'x' must be an array and must have at least two dimensions. Could you help me to fix this problem? I hope you have time to help me, thank you.
Hello, I wonder if anyone has figured this out? I generated the ExpressionSet for my single-cell data based on the advice, however, when I ran music.prop, it gave an error. Do I have to change the ExpressionSet format to SingleCellExperiment format? In the example, EMTAB.sce is a SingleCellExperiment object. Please advise. Thank you!