Generated the same results as Xenium did
Here's what I did
proseg --xenium "${INPUT_DIR}/transcripts.parquet" \
--output-expected-counts "expected-counts.csv.gz" \
--output-cell-metadata "cell-metadata.csv.gz" \
--output-cell-polygons "cell_polygons.geojson.gz" \
--output-transcript-metadata "transcripts_filtered.parquet"
proseg-to-baysor \
transcripts_filtered.parquet \
cell_polygons.geojson.gz \
--output-transcript-metadata baysor-transcript-metadata.csv \
--output-cell-polygons baysor-cell-polygons.geojson
xeniumranger import-segmentation \
--id proj_id \
--xenium-bundle ${INPUT_DIR}/ \
--viz-polygons baysor-cell-polygons.geojson \
--transcript-assignment baysor-transcript-metadata.csv \
--units microns \
--localcores 6
The output files in proj_id/outs replicate Xenium's original outputs:
analysis_summary.htmlcontains the same cell number as Xenium's analysis_summary report, and the cell number doesn't match the result fromcell-metadata.csv.gz.experiment.xeniummaintains the same cell boundary coordinates as the Xenium's original one.
That's peculiar. There may be a very small number of cells dropped in the conversion. I would check that the analysis_summary.html matches what's in baysor-cell-polygons.geojson. You can count polygons in the geojsos with this python code.
import json
print(len(json.load(open("baysor-cell-polygons.geojson"))["geometries"]))
If that's doesn't match, then my other thought is that it's possible proj_id may have been incorrectly generated initially and xeniumranger is refusing to overwrite it. I would try deleting that directly and running xeniumranger again.
The number of "geometries" inbaysor-cell-polygons.geojson file matches Xenium's original cell count, but neither the cell number from cell-metadata.csv.gz nor the output of print(len(json.load(open("cell_polygons.geojson"))["features"])) does.
Cell number from cell-metadata.csv.gz and cell_polygons.geojson match.
This conversion will drop any cells that have no assigned transcripts, since that causes issues with xenium ranger. That's likely what you are seeing here. It's typically a very small number of cells. If you are losing a large number of cells, it's indicative of a bigger issue (low transcripts per cell, for whatever reason).