LinSeed icon indicating copy to clipboard operation
LinSeed copied to clipboard

Hello! Could you post a quick tutorial on how to format a linseed object?

Open methornton opened this issue 6 years ago • 2 comments

Hello!

I am working through the tutorial and I have my own RNA-seq data that I would like to process with linseed. Does the LinseedObject function require data be formatted exactly as "GSE19830_series_matrix.txt"? I have an RNA-seq data set that has annotation for genes , raw counts, and RPKM. I don't know how many cell types are present, but I expect at least 10 -12.

Can you tell me which of these fields must be supplied?

Fields:

     ‘exp’ List of two elements raw and normalized gene expression
          dataset

     ‘name’ Character, optional, dataset name

     ‘cellTypeNumber’ Identified cell type number, required for
          projection, corner detection and deconvolution

     ‘projection’ Projection of genes into space lower-dimensionality
          (presumably simplex)

     ‘endpoints’ Simplex corners (in normalized, non-reduced space)

     ‘endpointsProjection’ Simplex corners (in reduced space)

     ‘distances’ Stores distances for every gene to each corner in
          reduced space

     ‘markers’ List that stores signatures genes for deconvolution, can
          be set manually or can be obtained by ‘selectGenes(k)’

     ‘signatures’ Deconvolution signature matrix

     ‘proportions’ Deconvolution proportion matrix

     ‘pairwise’ Calculated pairwise collinearity measure

The header of my RNA-seq data looks like this:

EnsemblID	EntrezID	RGD_ID	Geneme	GeneType	logFC	logCPM	LR	PValue	FDR	SA33599_rev	SA33601_rev	SA33604_rev	SA33598_rev	SA33600_rev	SA33602_rev	SA33603_rev	SA33605_rev	SA33606_rev	SA33598_rev_RPKM	SA33599_rev_RPKM	SA33600_rev_RPKM	SA33601_rev_RPKM	SA33602_rev_RPKM	SA33603_rev_RPKM	SA33604_rev_RPKM	SA33605_rev_RPKM	SA33606_rev_RPKM	Chr	Strand	length	NoExons	RNACentralID	miRBaseID	miRBaseACC	TM_Helix	HAMAP_ID	Description
ENSRNOG00000005609	29458	3165	Neurod1	protein_coding	-4.41557073893638	5.09105209110567	111.392747290707	4.85365557971023E-26	7.76293673418854E-22	174	218	11	16	41	27	42	388	5	0.720808668436819	13.0576466284548	1.93454971657025	10.9567107210054	1.75455632681648	1.57289902939802	0.773458305076102	14.4906203372679	0.33082393003395	3	-1	5248	3						neuronal differentiation 1 [Source:RGD Symbol;Acc:3165]
ENSRNOG00000003680	25451	2650	Gabrb2	protein_coding	-4.82293017899498	4.31972433520164	107.686834920917	3.14786664687739E-25	2.51734895750785E-21	98	134	5	6	144	14	25	225	3	0.672937124992248	18.3090140777356	16.9153796849254	16.7668593078227	2.2649301153254	2.33085245162443	0.875260735086981	20.919966761681	0.494164322054507	10	1	2108	10				TMhelix		gamma-aminobutyric acid type A receptor beta 2 subunit [Source:RGD Symbol;Acc:2650]

I can get the 'normCounts' out from the R package 'edgeR', if this is necessary, how to format it? Any advice or assistance is greatly appreciated!! Thank you!

methornton avatar Nov 01 '19 18:11 methornton

Hi! I'm also trying/testing linseed and used CPMs (from edgeR), TPMs (from RSEM) and also FPKM (cufflinks) matrices.

Matrices looked like: transcript_id sample1 sample2 sample3 <--------header ENST000000000 5.456 7.876 4.194 <-------- transcript/gene id and it's expression values per sample in CPMs/TPMs/FPKMs

The expected cell type number entered by hand into R script. Idk, if linseed allows to add more than one number simultaneously. I just tried different expected numbers per each script run.

By now my results are not as beautiful as they could be.

Some more detailed tutorial is appreciated! :)

pushtiks avatar Dec 04 '19 11:12 pushtiks

@methornton

You can just provide the expression matrix to a constructor of the Linseed Class (basically matrix objects) I would suggest using something like TPMs, any normalization that already took library size into an account.

Cheers and sorry for the slow replies, Konstantin

konsolerr avatar Jul 06 '20 11:07 konsolerr