proseg icon indicating copy to clipboard operation
proseg copied to clipboard

Generated the same results as Xenium did

Open Yuling192 opened this issue 6 months ago • 3 comments

Here's what I did

proseg --xenium "${INPUT_DIR}/transcripts.parquet" \
       --output-expected-counts     "expected-counts.csv.gz" \
       --output-cell-metadata       "cell-metadata.csv.gz" \
       --output-cell-polygons       "cell_polygons.geojson.gz" \
       --output-transcript-metadata "transcripts_filtered.parquet"

proseg-to-baysor \
    transcripts_filtered.parquet \
    cell_polygons.geojson.gz \
    --output-transcript-metadata baysor-transcript-metadata.csv \
    --output-cell-polygons baysor-cell-polygons.geojson


xeniumranger import-segmentation \
    --id proj_id \
    --xenium-bundle ${INPUT_DIR}/ \
    --viz-polygons baysor-cell-polygons.geojson \
    --transcript-assignment baysor-transcript-metadata.csv \
    --units microns \
    --localcores 6

The output files in proj_id/outs replicate Xenium's original outputs:

  • analysis_summary.html contains the same cell number as Xenium's analysis_summary report, and the cell number doesn't match the result from cell-metadata.csv.gz.
  • experiment.xenium maintains the same cell boundary coordinates as the Xenium's original one.

Yuling192 avatar Jun 11 '25 16:06 Yuling192

That's peculiar. There may be a very small number of cells dropped in the conversion. I would check that the analysis_summary.html matches what's in baysor-cell-polygons.geojson. You can count polygons in the geojsos with this python code.

import json
print(len(json.load(open("baysor-cell-polygons.geojson"))["geometries"]))

If that's doesn't match, then my other thought is that it's possible proj_id may have been incorrectly generated initially and xeniumranger is refusing to overwrite it. I would try deleting that directly and running xeniumranger again.

dcjones avatar Jun 11 '25 21:06 dcjones

The number of "geometries" inbaysor-cell-polygons.geojson file matches Xenium's original cell count, but neither the cell number from cell-metadata.csv.gz nor the output of print(len(json.load(open("cell_polygons.geojson"))["features"])) does. Cell number from cell-metadata.csv.gz and cell_polygons.geojson match.

Yuling192 avatar Jun 12 '25 14:06 Yuling192

This conversion will drop any cells that have no assigned transcripts, since that causes issues with xenium ranger. That's likely what you are seeing here. It's typically a very small number of cells. If you are losing a large number of cells, it's indicative of a bigger issue (low transcripts per cell, for whatever reason).

dcjones avatar Jun 13 '25 20:06 dcjones