Hello! Could you post a quick tutorial on how to format a linseed object?
Hello!
I am working through the tutorial and I have my own RNA-seq data that I would like to process with linseed. Does the LinseedObject function require data be formatted exactly as "GSE19830_series_matrix.txt"? I have an RNA-seq data set that has annotation for genes , raw counts, and RPKM. I don't know how many cell types are present, but I expect at least 10 -12.
Can you tell me which of these fields must be supplied?
Fields:
‘exp’ List of two elements raw and normalized gene expression
dataset
‘name’ Character, optional, dataset name
‘cellTypeNumber’ Identified cell type number, required for
projection, corner detection and deconvolution
‘projection’ Projection of genes into space lower-dimensionality
(presumably simplex)
‘endpoints’ Simplex corners (in normalized, non-reduced space)
‘endpointsProjection’ Simplex corners (in reduced space)
‘distances’ Stores distances for every gene to each corner in
reduced space
‘markers’ List that stores signatures genes for deconvolution, can
be set manually or can be obtained by ‘selectGenes(k)’
‘signatures’ Deconvolution signature matrix
‘proportions’ Deconvolution proportion matrix
‘pairwise’ Calculated pairwise collinearity measure
The header of my RNA-seq data looks like this:
EnsemblID EntrezID RGD_ID Geneme GeneType logFC logCPM LR PValue FDR SA33599_rev SA33601_rev SA33604_rev SA33598_rev SA33600_rev SA33602_rev SA33603_rev SA33605_rev SA33606_rev SA33598_rev_RPKM SA33599_rev_RPKM SA33600_rev_RPKM SA33601_rev_RPKM SA33602_rev_RPKM SA33603_rev_RPKM SA33604_rev_RPKM SA33605_rev_RPKM SA33606_rev_RPKM Chr Strand length NoExons RNACentralID miRBaseID miRBaseACC TM_Helix HAMAP_ID Description
ENSRNOG00000005609 29458 3165 Neurod1 protein_coding -4.41557073893638 5.09105209110567 111.392747290707 4.85365557971023E-26 7.76293673418854E-22 174 218 11 16 41 27 42 388 5 0.720808668436819 13.0576466284548 1.93454971657025 10.9567107210054 1.75455632681648 1.57289902939802 0.773458305076102 14.4906203372679 0.33082393003395 3 -1 5248 3 neuronal differentiation 1 [Source:RGD Symbol;Acc:3165]
ENSRNOG00000003680 25451 2650 Gabrb2 protein_coding -4.82293017899498 4.31972433520164 107.686834920917 3.14786664687739E-25 2.51734895750785E-21 98 134 5 6 144 14 25 225 3 0.672937124992248 18.3090140777356 16.9153796849254 16.7668593078227 2.2649301153254 2.33085245162443 0.875260735086981 20.919966761681 0.494164322054507 10 1 2108 10 TMhelix gamma-aminobutyric acid type A receptor beta 2 subunit [Source:RGD Symbol;Acc:2650]
I can get the 'normCounts' out from the R package 'edgeR', if this is necessary, how to format it? Any advice or assistance is greatly appreciated!! Thank you!
Hi! I'm also trying/testing linseed and used CPMs (from edgeR), TPMs (from RSEM) and also FPKM (cufflinks) matrices.
Matrices looked like: transcript_id sample1 sample2 sample3 <--------header ENST000000000 5.456 7.876 4.194 <-------- transcript/gene id and it's expression values per sample in CPMs/TPMs/FPKMs
The expected cell type number entered by hand into R script. Idk, if linseed allows to add more than one number simultaneously. I just tried different expected numbers per each script run.
By now my results are not as beautiful as they could be.
Some more detailed tutorial is appreciated! :)
@methornton
You can just provide the expression matrix to a constructor of the Linseed Class (basically matrix objects) I would suggest using something like TPMs, any normalization that already took library size into an account.
Cheers and sorry for the slow replies, Konstantin