The method for annotating genes with cell types
With this PR we propose our new annotation method to Scannpy:
Annotator marks the data with cell type annotations based on marker genes. Over-expressed genes are selected with the Mann-Whitney U tests and cell types are assigned with the hypergeometric test. This function first selects genes from gene expression data with the Mann Whitney U test, then annotate them with the hypergeometric test, and finally filter out cell types that have zero scores for all cells. The results are scores that tell how probable each cell type is for each cell.
Hope you like the method and merge it to Scampy.
Thanks for the PR! Gene-set based annotation would be pretty useful to have here.
Is there any chance there's a preprint we could look at for a little more context on the method?
@ivirshup it is currently in writing. Will write back to you when we will have it ready.
Any update on this? Can you add a test (probably reusing the example already in the method docstring)?
@fidelram, we are still working paper/preprint. I will post it soon.
I will add tests. So in order for the test to work should I add my library in the requirements.txt? What I observed is that other external packages are not included in project requirements.
Yeah, please add it together with a comment mentioning where is needed (e.g external.tl.annotator )
@PrimozGodec, probably don't add this to requirements.txt, since the requirement should be optional for install. I think instead you should mark it with something like:
from importlib.util import find_spec
@pytest.mark.skipif(find_spec('pointannotator') is None, reason="pointannotator not installed")
You can add a requirement for the package to this line in setup.py: https://github.com/theislab/scanpy/blob/d8f32c040f3a5f4fc07998b269796ca58de84b40/setup.py#L41
Maybe we should eventually have a second requirements file for CI testing, like we do for anndata.
I added unit tests and reformated the code.
Thank you. We’re using pytest though, so please write the tests that way:
- Remove the class and make all its methods top-level functions
- Make
setUpintofixtures - Just use
assert
@pytest.fixture
def markers():
return pd.DataFrame(
...
)
@pytest.fixture
def adata():
...
return AnnData(data.values, var=data.columns.values)
def test_remove_empty_column(adata, markers):
...
annotations = annotator(adata, markers, num_genes=20)
...
assert len(annotations) == len(self.anndata)
...
Only remaining thought: I have slight concerns about the name being too generic, but then again, this does exactly what people expect a “cell type annotator based on marker genes” to do.
We decided not to add more packages to external but you are more than welcome to add your own package to the scverse ecosystem: https://scverse.org/packages/#ecosystem