conga
conga copied to clipboard
Error in Aggregate the individual GEX runs into a single AnnData object
Hello,
conga is a wonderful tool!
I ran into an issue with explore fancy_conga_pipeline_with_batches_and_gammadelta_tcrs notebook.
My command : gex_datasets = sorted(glob.glob('*-CD3')) diseases = ['C','NC','CT'] # colitis, no-colitis, healthy control contigs_file = '/home/shpc_100668/conga/GSE144469_RAW/GSE144469_TCR_filtered_contig_annotations_all.csv' all_contigs = pd.read_csv(contigs_file) all_data = [] for donor_num, gex_dir in enumerate(gex_datasets): # The folder name is also the donor ID donor = gex_dir.split('-')[0] donor_contigs = all_contigs[all_contigs.barcode.str.endswith(donor)].copy() # change the barcode suffix to '-1' to match the GEX data donor_contigs['barcode'] = donor_contigs.barcode.str.split('-').str.get(0)+'-1' donor_contigs_file = f'{donor}_abtcr_filtered_contigs.csv' donor_contigs.to_csv(donor_contigs_file) # process the contigs to generate conga clonotypes donor_clones_file = f'{donor}_abtcr_clones.tsv' make_10x_clones_file( donor_contigs_file, organism = 'human', # using 'human' for TCRab clones_file = donor_clones_file, stringent = True, # (the default) see Note #1 on clonotype filtering ) # read the GEX data and the clonotypes into CoNGA adata = conga.preprocess.read_dataset( gex_dir, '10x_mtx', donor_clones_file, allow_missing_kpca_file=True) disease = donor[:-1] adata.obs['disease'] = disease adata.obs['disease_int'] = diseases.index(disease) # conga batch ids are integers adata.obs['donor'] = donor adata.obs['donor_int'] = donor_num # conga batch ids are integers all_data.append( adata ) new_adata = all_data[0].concatenate(all_data[1:]) new_adata.write('merged_gex_abtcr.h5ad')
Error: IndexError Traceback (most recent call last)
/tmp/ipykernel_1354605/1967687937.py in
IndexError: list index out of range
I'm really at a loss as to how to proceed, and any guidance would be much appreciated! Thank you for your kind help!
Hi there, thanks for trying conga, and thanks for the feedback. This error suggests that the list "all_data" is empty, which may be because the preceding loop did not execute. The loop was over the files found by the glob command
gex_datasets = sorted(glob.glob('*-CD3'))
Could you check and see whether the expected files are present and in the directory where the notebook is running? These would be the *-CD3 folders that have the GEX counts data in them.
Thank you for your help! I have solved this error by changing the reading directory: gex_datasets = sorted(glob.glob('/home/shpc_100668/conga/GSE144469_RAW/*-CD3')) But I got another issue in the next step, I have put these *-gdTCR_filtered_contig_annotations.csv files in the reading directory('/home/shpc_100668/conga/GSE144469_RAW/').
My command : gex_datasets = sorted(glob.glob('/home/shpc_100668/conga/GSE144469_RAW/*-CD3')) diseases = ['C','NC','CT'] # colitis, no-colitis, healthy control contigs_file = '/home/shpc_100668/conga/GSE144469_RAW/GSE144469_TCR_filtered_contig_annotations_all.csv' all_contigs = pd.read_csv(contigs_file) all_data = [] for donor_num, gex_dir in enumerate(gex_datasets): donor = gex_dir.split('-')[0] donor_contigs = all_contigs[all_contigs.barcode.str.endswith(donor)].copy() donor_contigs['barcode'] = donor_contigs.barcode.str.split('-').str.get(0)+'-1' donor_contigs_file = f'{donor}_abtcr_filtered_contigs.csv' donor_contigs.to_csv(donor_contigs_file) donor_clones_file = f'{donor}_abtcr_clones.tsv' make_10x_clones_file( donor_contigs_file, organism = 'human', # using 'human' for TCRab clones_file = donor_clones_file, stringent = True, # (the default) see Note #1 on clonotype filtering ) adata = conga.preprocess.read_dataset( gex_dir, '10x_mtx', donor_clones_file, allow_missing_kpca_file=True) disease = donor[:-1] adata.obs['disease'] = disease adata.obs['disease_int'] = diseases.index(disease) # conga batch ids are integers adata.obs['donor'] = donor adata.obs['donor_int'] = donor_num all_data.append( adata ) new_adata = all_data[0].concatenate(all_data[1:]) new_adata.write('merged_gex_abtcr.h5ad')
error: ab_counts: [] old_unpaired_barcodes: 0 old_paired_barcodes: 0 new_stringent_paired_barcodes: 0 reading: /home/shpc_100668/conga/GSE144469_RAW/C1-CD3 of type 10x_mtx total barcodes: 3862 (3862, 33538) reading: /home/shpc_100668/conga/GSE144469_RAW/C1_abtcr_clones.tsv WARNING: missing kpca_file: /home/shpc_100668/conga/GSE144469_RAW/C1_abtcr_clones_AB.dist_50_kpcs WARNING: X_tcr_pca will be empty Reducing to the 0 barcodes (out of 3862) with paired TCR sequence data /home/shpc_100668/conga/conga/preprocess.py:233: DeprecationWarning: Use is_view instead of isview, isview will be removed in the future. if adata.isview: # ran into trouble with AnnData views vs copies
AttributeError Traceback (most recent call last)
/tmp/ipykernel_2715303/7264258.py in
~/conga/conga/preprocess.py in read_dataset(gex_data, gex_data_type, clones_file, make_var_names_unique, keep_cells_without_tcrs, kpca_file, allow_missing_kpca_file, gex_only, suffix_for_non_gene_features) 403 404 tcrs = [ barcode2tcr[x] for x in adata.obs.index ] --> 405 store_tcrs_in_adata( adata, tcrs ) 406 407 return adata
~/conga/conga/preprocess.py in store_tcrs_in_adata(adata, tcrs) 178 179 # ensure lower case --> 180 adata.obs['cdr3a_nucseq'] = adata.obs.cdr3a_nucseq.str.lower() 181 adata.obs['cdr3b_nucseq'] = adata.obs.cdr3b_nucseq.str.lower() 182
~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/generic.py in getattr(self, name) 5485 ): 5486 return self[name] -> 5487 return object.getattribute(self, name) 5488 5489 def setattr(self, name: str, value) -> None:
~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/accessor.py in get(self, obj, cls) 179 # we're accessing the attribute of the class, i.e., Dataset.geo 180 return self._accessor --> 181 accessor_obj = self._accessor(obj) 182 # Replace the property with the accessor object. Inspired by: 183 # https://www.pydanny.com/cached-property.html
~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/strings/accessor.py in init(self, data) 166 from pandas.core.arrays.string_ import StringDtype 167 --> 168 self._inferred_dtype = self._validate(data) 169 self._is_categorical = is_categorical_dtype(data.dtype) 170 self._is_string = isinstance(data.dtype, StringDtype)
~/anaconda3/envs/conga4/lib/python3.7/site-packages/pandas/core/strings/accessor.py in _validate(data) 223 224 if inferred_dtype not in allowed_types: --> 225 raise AttributeError("Can only use .str accessor with string values!") 226 return inferred_dtype 227
AttributeError: Can only use .str accessor with string values!
Thank you for your kind help!