cellxgene icon indicating copy to clipboard operation
cellxgene copied to clipboard

[BUG] Gene sets file is ignored if it points to s3

Open arogozhnikov opened this issue 2 years ago • 4 comments

Describe the bug

# This works
cellxgene launch s3://<somewhere>/test_data.h5ad --gene-sets-file /local/gene_sets.csv --port 5005
# This doesn't work
cellxgene launch s3://<somewhere>/test_data.h5ad --gene-sets-file s3://<somewhere>/gene_sets.csv --port 5005

First command shows gene sets, but second does not.

And no error/complain in the terminal:

[cellxgene] Starting the CLI...
[cellxgene] Loading data from test_data.h5ad.
[cellxgene] Warning: Anndata data matrix is sparse, but not a CSC (columnar) matrix.  Performance may be improved by using CSC.
[cellxgene] Warning: Var annotation 'gene_name_ensembl' has 32974 categories, this may be cumbersome or slow to display. We recommend setting the --max-category-items option to 500, this will hide categorical annotations with more than 500 categories in the UI
[cellxgene] Warning: Var annotation 'gene_names' has 33680 categories, this may be cumbersome or slow to display. We recommend setting the --max-category-items option to 500, this will hide categorical annotations with more than 500 categories in the UI
[cellxgene] Warning: Var annotation 'gene_name_ranger' has 33660 categories, this may be cumbersome or slow to display. We recommend setting the --max-category-items option to 500, this will hide categorical annotations with more than 500 categories in the UI
[cellxgene] Launching! Please go to http://localhost:5005 in your browser.
[cellxgene] Type CTRL-C at any time to exit.

Expected behavior

I expect there should be no difference between local gene sets file and the one in the cloud.

According to source code, there should indeed be no difference

Version (please complete the following information):

  • cellxgene == 1.0.0 (pip-installed)
  • ubuntu20.04 in docker
  • python 3.8.10

arogozhnikov avatar Jan 28 '22 22:01 arogozhnikov

Thanks Alex! I'm following up on this and will let you know if I can replicate

MaximilianLombardo avatar Jan 31 '22 16:01 MaximilianLombardo

Hi @MaximilianLombardo were you able to reproduce this behavior?

arogozhnikov avatar Feb 02 '22 06:02 arogozhnikov

Hey Alex,

Thanks for the ping - I can confirm that your experience is not unexpected (I didn't get a chance to reproduce yet, but I synced up with some team members on the subject). Officially, we do not support gene sets on S3 - so we make no guarantees about launching with gene sets that are hosted there. Essentially, using gene sets from S3 is a bit trickier than using datasets from S3 because it is a read-write operation vs a read only operation. That being said, since the gene sets files are usually small - perhaps a viable option would be for you download the gene set locally from S3, launch with cellxgene (and potentially modify in session) and update the file once you are finished with your cellxgene session?

MaximilianLombardo avatar Feb 04 '22 19:02 MaximilianLombardo

So right now I download it, but there is no way to know when user is done as cellxgene works as a service as a part of cellxgene-gateway in cloud, and I don't have a hook when it gets redeployed.

arogozhnikov avatar Feb 05 '22 18:02 arogozhnikov