tcrdist3 icon indicating copy to clipboard operation
tcrdist3 copied to clipboard

`IndexError` when running `TCRrep` constructor

Open kamurani opened this issue 6 months ago • 0 comments

When running the following as described in the tcrdist documentation with my own dataframe of sequences (that as far as I can tell, are all formatted correctly), I get the following:

tr = TCRrep(cell_df = dff,
            organism = 'mouse',
            chains = ['alpha','beta'],
            db_file = 'alphabeta_gammadelta_db.tsv',
            compute_distances = False)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[110], line 2
      1 # Run TCRrep 
----> 2 tr = TCRrep(cell_df = dff,
      3             organism = 'mouse',
      4             chains = ['alpha','beta'],
      5             db_file = 'alphabeta_gammadelta_db.tsv',
      6             compute_distances = False)

File ~/anaconda3/envs/tcrdist3/lib/python3.8/site-packages/tcrdist/repertoire.py:182, in TCRrep.__init__(self, organism, chains, db_file, archive_name, blank, cell_df, clone_df, imgt_aligned, infer_all_genes, infer_cdrs, infer_index_cols, deduplicate, use_defaults, store_all_cdr, compute_distances, index_cols, cpus, df2, archive_result)
    180 if infer_cdrs:
    181     for chain in self.chains:
--> 182         self.infer_cdrs_from_v_gene(chain = chain, imgt_aligned = self.imgt_aligned)
    183             # Assume all provided columns are index columns, except 'count' 'cell_id', 'clone_id'
    185 if infer_index_cols:

File ~/anaconda3/envs/tcrdist3/lib/python3.8/site-packages/tcrdist/repertoire.py:518, in TCRrep.infer_cdrs_from_v_gene(self, chain, imgt_aligned)
    513     self.cell_df = self.cell_df.assign(cdr1_a_aa=list(map(f0, self.cell_df.v_a_gene)),
    514                                        cdr2_a_aa=list(map(f1, self.cell_df.v_a_gene)),
    515                                        pmhc_a_aa=list(map(f2, self.cell_df.v_a_gene)))
    516 if chain == "beta":
    517     self.cell_df = self.cell_df.assign(cdr1_b_aa=list(map(f0, self.cell_df.v_b_gene)),
--> 518                                        cdr2_b_aa=list(map(f1, self.cell_df.v_b_gene)),
    519                                        pmhc_b_aa=list(map(f2, self.cell_df.v_b_gene)))
...
--> 743     aa_string = self.all_genes[organism][gene].__dict__[attr][cdr]
    744 except KeyError:
    745     aa_string = None

IndexError: list index out of range

I have no idea what this code in tcrdist/repertoire.py is doing, but I am assuming that for now, I can include an IndexError in the try: except: to also set aa_string to None if this exception is encountered.

I would like to unpack what is going on here and provide a more effective error message as I am not sure what is wrong with the input dataframe to cause this in the first place.

kamurani avatar Aug 29 '24 02:08 kamurani