dandelion icon indicating copy to clipboard operation
dandelion copied to clipboard

process Parse Biosciences Evercode TCR/BCR files

Open zktuong opened this issue 5 months ago • 0 comments

Description of the issue

the file formats from the Evercode TCR/BCR are a bit weird and not fully AIRR compliant. So this is the current manual workaround:

import dandelion as ddl
import pandas as pd

# use pandas to read in the file first and then remap the column names
data = pd.read_csv("tcr_annotation_airr.tsv", sep="\t")
data = data.rename(
    columns={
        "cell_barcode": "cell_id",
        "read_count": "consensus_count",
        "transcript_count": "umi_count",
        "cdr3": "junction",
        "cdr3_aa": "junction_aa",
    }
)

vdj = ddl.Dandelion(data)

# if want to reannotate with IMGT, prepare files that look like the 10x format
vdj.write_10x(folder="dandelion_data")

# then process the `dandelion_data` folder using singularity image

zktuong avatar Sep 13 '24 13:09 zktuong