Celltag motif regexes?
Hi,
This is with regards to single cell tagging workflow (using addgene Version 1 lentivirals). We are testing the single cell tagging in GBM patient. We currently have data for one patient for which we are using the "addgene Version 1 lentivirals". I was wondering looking at your github code here https://github.com/morris-lab/BiddyetalWorkflow if this is an example "GGT[ACTG]{8}GAATTC" of the regex that we have to pull out or is it specific to all V1 tags? I was wondering if you could provide more documentation or clarity on that?
If this regex 'GGT[ACTG]{8}GAATTC' is not representative of all the tags I was wondering how did you come about with all representative tags?
samtools view hf1.d15.possorted_genome_bam.bam | grep -P 'GGT[ACTG]{8}GAATTC' > v1.celltag.reads.out
Hi,
Thanks for reaching out!
Yes, the regex GGT[ACTG]{8}GAATTC is a universal identifier for all Version 1 (V1) single-cell tags. This pattern captures the GGT prefix, the 8-nucleotide variable region, and the GAATTC suffix, which are conserved across all V1 tags.
I may be running into an issue here then (it is human GBM data btw). It is a trial run on this data and we seem to be running into an issue if I follow your guidelines IMO.
Once I make the celltag.reads.out file using the above command I run the celltag parse script as specified in your github.
./scripts/celltag.parse.reads.10x.sh -v tagregex="CCGGT([ACTG]{8})GAATTC" v1.celltag.reads.out > v1.celltag.parsed.tsv
Rscript ./BiddyetalWorkflow/scripts/matrix.count.celltags.R ./outs/filtered_feature_bc_matrix/barcodes.tsv v1.celltag.parsed.tsv APA1.v1
#2. Clone Calling
library(igraph)
library(proxy)
library(corrplot)
library(data.table)
source("./BiddyetalWorkflow/scripts/CellTagCloneCalling_Function.R")
mef.mat <- as.data.frame(readRDS("/n/scratch/users/s/sak4832/celltagging/APa1.v1.celltag.matrix.Rds"))
dim(mef.mat)
[1] 1241 46
rownames(mef.mat) <- mef.mat$Cell.BC
mef.mat <- mef.mat[,-1]
mef.bin <- SingleCellDataBinarization(celltag.dat = mef.mat, 2)
dim(mef.bin)
[1] 1241 45
mef.filt <- SingleCellDataWhitelist(celltag.dat = mef.bin, whitels.cell.tag.file = "/n/scratch/users/s/sak4832/c> ltagging/BiddyetalWorkflow/whitelist/V1.CellTag.Whitelist.csv")
dim(mef.filt)
[1] 1241 34
mef.filt <- MetricBasedFiltering(whitelisted.celltag.data = mef.filt, cutoff = 10, comparison = "less")
dim(mef.filt)
[1] 1241 34
> dim(mef.filt <- MetricBasedFiltering(whitelisted.celltag.data = mef.filt, cutoff = 1, comparison = "greater")
+ )
[1] 0 34
This seems to indicate there is nothing or no celltag that passes the filtering criteria which does not seem correct to me so I am wondering if I am doing anything wrong that you may be able to point out?
Thank you!
That should work. Have you confirmed that your cells are expressing CellTags? You can check by searching for CellTags directly in your FASTQ file using grep. If very few or no matches appear, it could indicate low expression of CellTags in your dataset. Let me know what you find!
fastq/APa1_S1_R1_001.fastq.gz Number of reads with the pattern: 3200 fastq/APa1_S1_R2_001.fastq.gz Number of reads with the pattern: 11024
How many cells in this dataset?
~1200 cells as per cellranger report.