immunarch
immunarch copied to clipboard
Support for bulk TCR deconvolution (TRUST4)
Hi, I would really like to use some of the functionality in immunarch.
I have TCR sequences from bulk RNA-sequencing data that I inferred with TRUST4 (https://github.com/liulab-dfci/TRUST4). According to the authors this outputs a VDJtools compatible file. However, repLoad
does not recognize the input. Would it be possible to add support for this type of output?
The output from TRUST4 looks something like this (usually with many more rows):
#count frequency CDR3nt CDR3aa V D J C cid cid_full_length
152 0.2771141 TGTCAGCCGTATTTTATCCGCTCACTTTC CQQYFATSPLTF IGKV4-1*01 . IGKJ4*01 IGKC assemble4 1
116 0.212622 TGTCATCAAATATTATACTTTCACACTTTC CQQYYSTFSLTF IGKV4-1*01 . IGKJ4*01 IGKC assemble18 1
One issue is that immunarch does not appear to be able to deal with the #
in count
. But even if I remove that manually, I've realized that another issue is that this file does not contain vend
in the column names. Changing
else if (str_detect(tolower(l), "cdr3nt") && str_detect(tolower(l), "vend") && str_detect(tolower(l), "v")) {
res_format <- "vdjtools"
}
in R/io.R
to
else if (str_detect(tolower(l), "cdr3nt") && str_detect(tolower(l), "v")) {
res_format <- "vdjtools"
}
Fixes the issue for some files, but in the case of others I get the following error:
Error: Assigned data `df[[.dstart]] - df[[.vend]] - 1` must be compatible with existing data.
x Existing data has 3 rows.
x Assigned data has 0 rows.
i Only vectors of size 1 are recycled.
I would happy to be a beta tester for this functionality/help with implementing it if that is of interest.
Hi Michael,
We implemented a support for TRUST4 in the latest pre-release version of Immunarch. Please install it and let us know if it works. Instructions: https://immunarch.com/articles/v1_introduction.html The version is pre-released so it might work inconsistently. We will be glad to fix all the issues promptly to stabilize the TRUST4 parser.
Best, Vadim
Hi Vadim,
Thanks for the quick reply! Works like a charm for the most part. My only two things to note:
- I have one sample where TRUST returned no TCR sequences. This results in the following warning while reading in:
Can't determine the type of V(D)J recombination. No insertions will be presented in the resulting data table. [!] Warning: zero clonotypes found, skipping
- I get the following warning for a second sample
Can't determine the type of V(D)J recombination. No insertions will be presented in the resulting data table.
And the sample looks something like this (note that I've had to change the sequence due to privacy reasons).
#count frequency CDR3nt CDR3aa V D J C cid cid_full_length
56 1.000000e+00 TGTGCGTGGAGCTGGAACCAGCTGCTGACCTTTGGTTCGGCGGACTTCTGG CAWGWNQLLTFGSADFW IGHV3-7*01 IGHD2-2*01 IGHJ5*01 IGHA1 assemble7 0
2 1.000000e+00 TGTTTCAATTACGCTACCCCGTGGTCGTTC CFNYATPWSF IGKV1-39*01 . IGKJ1*01 IGKC assemble47 0
Not sure why the second sample is returning a warning.
The only other suggestion I have is to also create a dataframe listing all the samples that were not included in the data table and the reason why. I have over 500 samples so it is a bit tedious to manually scroll through all the messages. But this a really minor thing.
Thanks for the help! Michael