scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

The method for annotating genes with cell types

Open PrimozGodec opened this issue 6 years ago • 9 comments

With this PR we propose our new annotation method to Scannpy:

Annotator marks the data with cell type annotations based on marker genes. Over-expressed genes are selected with the Mann-Whitney U tests and cell types are assigned with the hypergeometric test. This function first selects genes from gene expression data with the Mann Whitney U test, then annotate them with the hypergeometric test, and finally filter out cell types that have zero scores for all cells. The results are scores that tell how probable each cell type is for each cell.

Hope you like the method and merge it to Scampy.

PrimozGodec avatar Sep 04 '19 13:09 PrimozGodec

Thanks for the PR! Gene-set based annotation would be pretty useful to have here.

Is there any chance there's a preprint we could look at for a little more context on the method?

ivirshup avatar Sep 05 '19 08:09 ivirshup

@ivirshup it is currently in writing. Will write back to you when we will have it ready.

PrimozGodec avatar Sep 09 '19 11:09 PrimozGodec

Any update on this? Can you add a test (probably reusing the example already in the method docstring)?

fidelram avatar Oct 01 '19 12:10 fidelram

@fidelram, we are still working paper/preprint. I will post it soon.

I will add tests. So in order for the test to work should I add my library in the requirements.txt? What I observed is that other external packages are not included in project requirements.

PrimozGodec avatar Oct 02 '19 09:10 PrimozGodec

Yeah, please add it together with a comment mentioning where is needed (e.g external.tl.annotator )

fidelram avatar Oct 02 '19 11:10 fidelram

@PrimozGodec, probably don't add this to requirements.txt, since the requirement should be optional for install. I think instead you should mark it with something like:

from importlib.util import find_spec

@pytest.mark.skipif(find_spec('pointannotator') is None, reason="pointannotator not installed")

You can add a requirement for the package to this line in setup.py: https://github.com/theislab/scanpy/blob/d8f32c040f3a5f4fc07998b269796ca58de84b40/setup.py#L41

Maybe we should eventually have a second requirements file for CI testing, like we do for anndata.

ivirshup avatar Oct 02 '19 12:10 ivirshup

I added unit tests and reformated the code.

PrimozGodec avatar Oct 03 '19 13:10 PrimozGodec

Thank you. We’re using pytest though, so please write the tests that way:

  1. Remove the class and make all its methods top-level functions
  2. Make setUp into fixtures
  3. Just use assert
@pytest.fixture
def markers():
    return pd.DataFrame(
        ...
    )


@pytest.fixture
def adata():
    ...
    return AnnData(data.values, var=data.columns.values)


def test_remove_empty_column(adata, markers):
    ...
    annotations = annotator(adata, markers, num_genes=20)
    ...
    assert len(annotations) == len(self.anndata)
    ...

flying-sheep avatar Oct 07 '19 09:10 flying-sheep

Only remaining thought: I have slight concerns about the name being too generic, but then again, this does exactly what people expect a “cell type annotator based on marker genes” to do.

flying-sheep avatar Oct 07 '19 09:10 flying-sheep

We decided not to add more packages to external but you are more than welcome to add your own package to the scverse ecosystem: https://scverse.org/packages/#ecosystem

Zethson avatar May 08 '24 13:05 Zethson