haystack_bio icon indicating copy to clipboard operation
haystack_bio copied to clipboard

KeyError while running tests

Open micdonato opened this issue 3 years ago • 14 comments

Hi, I am trying to install haystack on our server and I am running into an error when running the tests: The tests complete successfully but at the end I get this:

INFO  @ Wed, 07 Apr 2021 16:10:54:
	 Analyzing MA0724.1 from:/home/user/haystack_test_output/HAYSTACK_PIPELINE_RESULTS/HAYSTACK_MOTIFS/HAYSTACK_MOTIFS_on_K562/genes_lists/MA0724.1_motif_region_in_target.tss.bed
/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/haystack/generate_tf_activity_plane.py:189:FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
  mapped_genes = map(str.upper, list(pd.read_table(motif_gene_filename,keep_default_na=False,na_values='null').dropna()['Symbol'].values.astype(str)))
Traceback (most recent call last):
  File "/home/users/.conda/envs/hotspots/bin/haystack_tf_activity_plane", line 10, in <module>
    sys.exit(main())
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/haystack/generate_tf_activity_plane.py", line 193, in main
    ds_values = zscore_series(gene_ranking.ix[mapped_genes, :].mean())
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 120, in __getitem__
    return self._getitem_tuple(key)
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 888, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1088, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1205, in _getitem_iterable
    raise_missing=False)
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1252, in _validate_read_indexer
    raise KeyError("{} not in index".format(not_found))
KeyError: "['BAGE5', 'GRIK1-AS2'] not in index"
INFO  @ Wed, 07 Apr 2021 16:10:54:
	 Test completed successfully

Should I be worried?

micdonato avatar Apr 07 '21 23:04 micdonato

An update to pandas is causing this. I am not sure if it is a cause of worry, but to be on the safe side, I would pin the version of pandas (and potentially other packages) to the ones here https://github.com/pinellolab/haystack_bio/blob/master/Dockerfile#L35. Alternatively, you can use the Docker container.

Rick,

rfarouni avatar Apr 10 '21 10:04 rfarouni

Thanks, it makes sense! I think I will use the docker container but I was considering building a Singularity container and pinning pandas will help.

micdonato avatar Apr 10 '21 21:04 micdonato

I have no experience building Singularity containers but I think it would be a great solution for people running the pipeline on HPC clusters. Maybe @lucapinello knows more about these types of containers. I'll ask him.

rfarouni avatar Apr 12 '21 07:04 rfarouni

They work roughly the same as Docker containers, it's just a matter to create the right recipe for building them. Usually I install a package locally to see if I am able to build everything, before going the container way.

My reason to use Singularity is mostly the root/user issue for Docker and to deal with filesystem isolation, but there are other differences as well.

Thanks!

micdonato avatar Apr 12 '21 16:04 micdonato

Hi there!

I think the current Docker file and image should work already with Singularity: https://github.com/pinellolab/haystack_bio/blob/master/Dockerfile

To make it run on singularity you could try with:

singularity run pinellolab/haystack_bio (add command and flags here)

Please let us know how it goes!

Thanks,

Luca

On Mon, Apr 12, 2021 at 12:46 PM micdonato @.***> wrote:

They work roughly the same as Docker containers, it's just a matter to create the right recipe for building them. Usually I install a package locally to see if I am able to build everything, before going the container way.

My reason to use Singularity is mostly the root/user issue for Docker and to deal with filesystem isolation, but there are other differences as well.

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pinellolab/haystack_bio/issues/7#issuecomment-817963272, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIH72W3SDNPC5EYLO3R3JLTIMPX5ANCNFSM42RWWO2A .

lucapinello avatar Apr 12 '21 16:04 lucapinello

Hi all, and thanks!

That is what I tried at first. Unfortunately, it seems that Singularity fails to actually build the image, as packages that should be installed are missing:

The command: singularity run docker://pinellolab/haystack_bio haystack_pipeline data/data_h3k27ac_6cells/samples_names.txt hg19 --blacklist hg19

The result:

INFO:    Using cached SIF image
Traceback (most recent call last):
  File "/usr/local/bin/haystack_pipeline", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2927, in <module>
    @_call_aside
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2913, in _call_aside
    f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2940, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 635, in _build_master
    ws.require(__requires__)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 943, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 829, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'scipy>=1.0.0' distribution was not found and is required by haystack-bio

That is why I wanted to try rebuild the singularity image from scratch.

micdonato avatar Apr 12 '21 17:04 micdonato

I can reproduce your error and it seems there is no simple solution to directly use the docker with singularity. You may want to explore this tool to convert the docker image : docker2singularity.

I have tested the docker image on my machine and it is still working as expected but I understand that this may not be a viable option for you.

You can try to downgrade pandas in the conda environment you have created previously and if necessary also the other packages:

numpy==1.13.3
scipy==1.0.0
matplotlib==2.1.0
pandas==0.21.0
&& pip install
bx-python==0.7.3
Jinja2==2.9.6
tqdm==4.19.4
weblogo==3.5.0 \

@rfarouni do you have the bandwidth to pin pandas in the next few days in the bioconda package and resubmit it so we can fix this for other users trying the package through bioconda? Of course this will require to create a separate conda env just for haystack

lucapinello avatar Apr 13 '21 12:04 lucapinello

@lucapinello I will look into this as soon as I can

rfarouni avatar Apr 14 '21 19:04 rfarouni

Thanks Rick!

On Wed, Apr 14, 2021 at 3:13 PM Rick Farouni @.***> wrote:

@lucapinello https://github.com/lucapinello I will look into this as soon as I can

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pinellolab/haystack_bio/issues/7#issuecomment-819766512, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIH72XUZ7FTLWNPYAUC463TIXSNLANCNFSM42RWWO2A .

lucapinello avatar Apr 14 '21 22:04 lucapinello

I found that the easiest way to deal with this error is to run conda install pandas==0.21 after running conda install haystack_bio. The test runs fine after that.

	 The expression values of the gene TEST1 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene SCIP are not present. Skipping it. 

INFO  @ Thu, 22 Apr 2021 18:10:50:
	 Gene:POU3F1 TF z-score:0.73 Targets z-score:1.58  Correlation:0.48 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene TST-1 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene OCT6 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene OTF-6 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene OTF6 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene OCT-6 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene TST1 are not present. Skipping it. 

INFO  @ Thu, 22 Apr 2021 18:10:50:
	 All done! Ciao! 

INFO  @ Thu, 22 Apr 2021 18:10:50:
	 Test completed successfully```

rfarouni avatar Apr 22 '21 16:04 rfarouni

Thanks Rick!

I will update the documentation accordingly to propose this fix.

lucapinello avatar Apr 22 '21 16:04 lucapinello

I see you already update it :)

What about we propose this as a single line?

conda install haystack_bio pandas==0.21

On Thu, Apr 22, 2021 at 12:24 PM Luca Pinello @.***> wrote:

Thanks Rick!

I will update the documentation accordingly to propose this fix.

lucapinello avatar Apr 22 '21 16:04 lucapinello

This seems to work as well

rfarouni avatar Apr 22 '21 17:04 rfarouni

Great, I think this is an excellent compromise and we don't need to update the recipe.

Thanks for looking into it!

On Thu, Apr 22, 2021 at 1:23 PM Rick Farouni @.***> wrote:

This seems to work as well

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pinellolab/haystack_bio/issues/7#issuecomment-825042976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIH72TWHTQEKCBC7CVSBULTKBLS3ANCNFSM42RWWO2A .

lucapinello avatar Apr 22 '21 17:04 lucapinello