immunarch Support for bulk TCR deconvolution (TRUST4)

Hi, I would really like to use some of the functionality in immunarch.

I have TCR sequences from bulk RNA-sequencing data that I inferred with TRUST4 (https://github.com/liulab-dfci/TRUST4). According to the authors this outputs a VDJtools compatible file. However, repLoad does not recognize the input. Would it be possible to add support for this type of output?

The output from TRUST4 looks something like this (usually with many more rows):

#count	frequency	CDR3nt	CDR3aa	V	D	J	C	cid	cid_full_length
152	0.2771141	TGTCAGCCGTATTTTATCCGCTCACTTTC	CQQYFATSPLTF	IGKV4-1*01	.	IGKJ4*01	IGKC	assemble4	1
116	0.212622	TGTCATCAAATATTATACTTTCACACTTTC	CQQYYSTFSLTF	IGKV4-1*01	.	IGKJ4*01	IGKC	assemble18	1

One issue is that immunarch does not appear to be able to deal with the # in count. But even if I remove that manually, I've realized that another issue is that this file does not contain vend in the column names. Changing

else if (str_detect(tolower(l), "cdr3nt") && str_detect(tolower(l), "vend") && str_detect(tolower(l), "v")) {
    res_format <- "vdjtools"
  }

in R/io.R to

else if (str_detect(tolower(l), "cdr3nt") && str_detect(tolower(l), "v")) {
    res_format <- "vdjtools"
  }

Fixes the issue for some files, but in the case of others I get the following error:

Error: Assigned data `df[[.dstart]] - df[[.vend]] - 1` must be compatible with existing data.
x Existing data has 3 rows.
x Assigned data has 0 rows.
i Only vectors of size 1 are recycled.

I would happy to be a beta tester for this functionality/help with implementing it if that is of interest.

Aug 18 '21 20:08 Michael-Geuenich

Hi Michael,

We implemented a support for TRUST4 in the latest pre-release version of Immunarch. Please install it and let us know if it works. Instructions: https://immunarch.com/articles/v1_introduction.html The version is pre-released so it might work inconsistently. We will be glad to fix all the issues promptly to stabilize the TRUST4 parser.

Best, Vadim

Aug 19 '21 15:08 vadimnazarov

Hi Vadim,

Thanks for the quick reply! Works like a charm for the most part. My only two things to note:

I have one sample where TRUST returned no TCR sequences. This results in the following warning while reading in:

Can't determine the type of V(D)J recombination. No insertions will be presented in the resulting data table.  [!] Warning: zero clonotypes found, skipping

I get the following warning for a second sample

Can't determine the type of V(D)J recombination. No insertions will be presented in the resulting data table.

And the sample looks something like this (note that I've had to change the sequence due to privacy reasons).

#count	frequency	CDR3nt	CDR3aa	V	D	J	C	cid	cid_full_length
56	1.000000e+00	TGTGCGTGGAGCTGGAACCAGCTGCTGACCTTTGGTTCGGCGGACTTCTGG	CAWGWNQLLTFGSADFW	IGHV3-7*01	IGHD2-2*01	IGHJ5*01	IGHA1	assemble7	0
2	1.000000e+00	TGTTTCAATTACGCTACCCCGTGGTCGTTC	CFNYATPWSF	IGKV1-39*01	.	IGKJ1*01	IGKC	assemble47	0

Not sure why the second sample is returning a warning.

The only other suggestion I have is to also create a dataframe listing all the samples that were not included in the data table and the reason why. I have over 500 samples so it is a bit tedious to manually scroll through all the messages. But this a really minor thing.

Thanks for the help! Michael

Aug 19 '21 16:08 Michael-Geuenich

immunarch immunarch copied to clipboard

Support for bulk TCR deconvolution (TRUST4)

immunarch
immunarch copied to clipboard