kite icon indicating copy to clipboard operation
kite copied to clipboard

Cell barcodes almost completely non-overlapping in vignette

Open heathergeiger opened this issue 4 years ago • 1 comments

I tried running through the vignette for the PBMC 1K dataset.

I was able to reproduce a list of 124,716 cell barcodes coming from the kallisto/bus pipeline.

However when I compared this list of 124,716 barcodes to the list of 713 barcodes in the RNA data according to 10X CellRanger, there is almost no overlap (just 20 barcodes).

Any idea what might be going on here? My understanding was that we may find a few barcodes difference, but that the overlap was supposed to be a high proportion.

I noticed in the Python notebook that you did not directly look at protein vs. RNA levels within the same cells. My understanding was that they are supposed to be measured for the same cells, but maybe I am misunderstanding the dataset.

heathergeiger avatar Nov 06 '20 22:11 heathergeiger

Edit: Looks like I can get the barcodes to match up perfectly by taking the complement of the 8th and 9th base (switch A to T, T to A, C to G, and G to C for the 8th and 9th base). Once you do this, the protein counts from CellRanger and kallisto/bus for matched barcodes are also very correlated. Any idea why this is, though?

heathergeiger avatar Nov 06 '20 23:11 heathergeiger