dandelion
dandelion copied to clipboard
process Parse Biosciences Evercode TCR/BCR files
Description of the issue
the file formats from the Evercode TCR/BCR are a bit weird and not fully AIRR compliant. So this is the current manual workaround:
import dandelion as ddl
import pandas as pd
# use pandas to read in the file first and then remap the column names
data = pd.read_csv("tcr_annotation_airr.tsv", sep="\t")
data = data.rename(
columns={
"cell_barcode": "cell_id",
"read_count": "consensus_count",
"transcript_count": "umi_count",
"cdr3": "junction",
"cdr3_aa": "junction_aa",
}
)
vdj = ddl.Dandelion(data)
# if want to reannotate with IMGT, prepare files that look like the 10x format
vdj.write_10x(folder="dandelion_data")
# then process the `dandelion_data` folder using singularity image