BiddyetalWorkflow icon indicating copy to clipboard operation
BiddyetalWorkflow copied to clipboard

Celltag motif regexes?

Open smk5g5 opened this issue 1 year ago • 6 comments

Hi,

This is with regards to single cell tagging workflow (using addgene Version 1 lentivirals). We are testing the single cell tagging in GBM patient. We currently have data for one patient for which we are using the "addgene Version 1 lentivirals". I was wondering looking at your github code here https://github.com/morris-lab/BiddyetalWorkflow if this is an example "GGT[ACTG]{8}GAATTC" of the regex that we have to pull out or is it specific to all V1 tags? I was wondering if you could provide more documentation or clarity on that?

If this regex 'GGT[ACTG]{8}GAATTC' is not representative of all the tags I was wondering how did you come about with all representative tags?

samtools view hf1.d15.possorted_genome_bam.bam | grep -P 'GGT[ACTG]{8}GAATTC' > v1.celltag.reads.out

smk5g5 avatar Jan 29 '25 19:01 smk5g5

Hi,

Thanks for reaching out!

Yes, the regex GGT[ACTG]{8}GAATTC is a universal identifier for all Version 1 (V1) single-cell tags. This pattern captures the GGT prefix, the 8-nucleotide variable region, and the GAATTC suffix, which are conserved across all V1 tags.

sam-morris avatar Jan 29 '25 20:01 sam-morris

I may be running into an issue here then (it is human GBM data btw). It is a trial run on this data and we seem to be running into an issue if I follow your guidelines IMO.

Once I make the celltag.reads.out file using the above command I run the celltag parse script as specified in your github.

./scripts/celltag.parse.reads.10x.sh -v tagregex="CCGGT([ACTG]{8})GAATTC" v1.celltag.reads.out > v1.celltag.parsed.tsv

Rscript ./BiddyetalWorkflow/scripts/matrix.count.celltags.R  ./outs/filtered_feature_bc_matrix/barcodes.tsv  v1.celltag.parsed.tsv APA1.v1

#2. Clone Calling

library(igraph)
library(proxy)
library(corrplot)
library(data.table)

 source("./BiddyetalWorkflow/scripts/CellTagCloneCalling_Function.R")
 mef.mat <- as.data.frame(readRDS("/n/scratch/users/s/sak4832/celltagging/APa1.v1.celltag.matrix.Rds"))
 dim(mef.mat)
[1] 1241   46
 rownames(mef.mat) <- mef.mat$Cell.BC
 mef.mat <- mef.mat[,-1]
 mef.bin <- SingleCellDataBinarization(celltag.dat = mef.mat, 2)
dim(mef.bin)
[1] 1241   45
 mef.filt <- SingleCellDataWhitelist(celltag.dat = mef.bin, whitels.cell.tag.file = "/n/scratch/users/s/sak4832/c> ltagging/BiddyetalWorkflow/whitelist/V1.CellTag.Whitelist.csv")
 dim(mef.filt)
[1] 1241   34
mef.filt <- MetricBasedFiltering(whitelisted.celltag.data = mef.filt, cutoff = 10, comparison = "less")
 dim(mef.filt)
[1] 1241   34
> dim(mef.filt <- MetricBasedFiltering(whitelisted.celltag.data = mef.filt, cutoff = 1, comparison = "greater")
+ )
[1]  0 34

This seems to indicate there is nothing or no celltag that passes the filtering criteria which does not seem correct to me so I am wondering if I am doing anything wrong that you may be able to point out?

Thank you!

smk5g5 avatar Jan 29 '25 20:01 smk5g5

That should work. Have you confirmed that your cells are expressing CellTags? You can check by searching for CellTags directly in your FASTQ file using grep. If very few or no matches appear, it could indicate low expression of CellTags in your dataset. Let me know what you find!

sam-morris avatar Jan 29 '25 20:01 sam-morris

fastq/APa1_S1_R1_001.fastq.gz Number of reads with the pattern: 3200 fastq/APa1_S1_R2_001.fastq.gz Number of reads with the pattern: 11024

smk5g5 avatar Feb 06 '25 21:02 smk5g5

How many cells in this dataset?

sam-morris avatar Feb 07 '25 16:02 sam-morris

Image

~1200 cells as per cellranger report.

smk5g5 avatar Feb 07 '25 18:02 smk5g5