haystack_bio
haystack_bio copied to clipboard
KeyError while running tests
Hi, I am trying to install haystack on our server and I am running into an error when running the tests: The tests complete successfully but at the end I get this:
INFO @ Wed, 07 Apr 2021 16:10:54:
Analyzing MA0724.1 from:/home/user/haystack_test_output/HAYSTACK_PIPELINE_RESULTS/HAYSTACK_MOTIFS/HAYSTACK_MOTIFS_on_K562/genes_lists/MA0724.1_motif_region_in_target.tss.bed
/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/haystack/generate_tf_activity_plane.py:189:FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
mapped_genes = map(str.upper, list(pd.read_table(motif_gene_filename,keep_default_na=False,na_values='null').dropna()['Symbol'].values.astype(str)))
Traceback (most recent call last):
File "/home/users/.conda/envs/hotspots/bin/haystack_tf_activity_plane", line 10, in <module>
sys.exit(main())
File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/haystack/generate_tf_activity_plane.py", line 193, in main
ds_values = zscore_series(gene_ranking.ix[mapped_genes, :].mean())
File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 120, in __getitem__
return self._getitem_tuple(key)
File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 888, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1088, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1205, in _getitem_iterable
raise_missing=False)
File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1161, in _get_listlike_indexer
raise_missing=raise_missing)
File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1252, in _validate_read_indexer
raise KeyError("{} not in index".format(not_found))
KeyError: "['BAGE5', 'GRIK1-AS2'] not in index"
INFO @ Wed, 07 Apr 2021 16:10:54:
Test completed successfully
Should I be worried?
An update to pandas is causing this. I am not sure if it is a cause of worry, but to be on the safe side, I would pin the version of pandas (and potentially other packages) to the ones here https://github.com/pinellolab/haystack_bio/blob/master/Dockerfile#L35. Alternatively, you can use the Docker container.
Rick,
Thanks, it makes sense! I think I will use the docker container but I was considering building a Singularity container and pinning pandas will help.
I have no experience building Singularity containers but I think it would be a great solution for people running the pipeline on HPC clusters. Maybe @lucapinello knows more about these types of containers. I'll ask him.
They work roughly the same as Docker containers, it's just a matter to create the right recipe for building them. Usually I install a package locally to see if I am able to build everything, before going the container way.
My reason to use Singularity is mostly the root/user issue for Docker and to deal with filesystem isolation, but there are other differences as well.
Thanks!
Hi there!
I think the current Docker file and image should work already with Singularity: https://github.com/pinellolab/haystack_bio/blob/master/Dockerfile
To make it run on singularity you could try with:
singularity run pinellolab/haystack_bio (add command and flags here)
Please let us know how it goes!
Thanks,
Luca
On Mon, Apr 12, 2021 at 12:46 PM micdonato @.***> wrote:
They work roughly the same as Docker containers, it's just a matter to create the right recipe for building them. Usually I install a package locally to see if I am able to build everything, before going the container way.
My reason to use Singularity is mostly the root/user issue for Docker and to deal with filesystem isolation, but there are other differences as well.
Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pinellolab/haystack_bio/issues/7#issuecomment-817963272, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIH72W3SDNPC5EYLO3R3JLTIMPX5ANCNFSM42RWWO2A .
Hi all, and thanks!
That is what I tried at first. Unfortunately, it seems that Singularity fails to actually build the image, as packages that should be installed are missing:
The command:
singularity run docker://pinellolab/haystack_bio haystack_pipeline data/data_h3k27ac_6cells/samples_names.txt hg19 --blacklist hg19
The result:
INFO: Using cached SIF image
Traceback (most recent call last):
File "/usr/local/bin/haystack_pipeline", line 6, in <module>
from pkg_resources import load_entry_point
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2927, in <module>
@_call_aside
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2913, in _call_aside
f(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2940, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 635, in _build_master
ws.require(__requires__)
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 943, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 829, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'scipy>=1.0.0' distribution was not found and is required by haystack-bio
That is why I wanted to try rebuild the singularity image from scratch.
I can reproduce your error and it seems there is no simple solution to directly use the docker with singularity. You may want to explore this tool to convert the docker image : docker2singularity.
I have tested the docker image on my machine and it is still working as expected but I understand that this may not be a viable option for you.
You can try to downgrade pandas in the conda environment you have created previously and if necessary also the other packages:
numpy==1.13.3
scipy==1.0.0
matplotlib==2.1.0
pandas==0.21.0
&& pip install
bx-python==0.7.3
Jinja2==2.9.6
tqdm==4.19.4
weblogo==3.5.0 \
@rfarouni do you have the bandwidth to pin pandas in the next few days in the bioconda package and resubmit it so we can fix this for other users trying the package through bioconda? Of course this will require to create a separate conda env just for haystack
@lucapinello I will look into this as soon as I can
Thanks Rick!
On Wed, Apr 14, 2021 at 3:13 PM Rick Farouni @.***> wrote:
@lucapinello https://github.com/lucapinello I will look into this as soon as I can
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pinellolab/haystack_bio/issues/7#issuecomment-819766512, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIH72XUZ7FTLWNPYAUC463TIXSNLANCNFSM42RWWO2A .
I found that the easiest way to deal with this error is to run conda install pandas==0.21
after running conda install haystack_bio
. The test runs fine after that.
The expression values of the gene TEST1 are not present. Skipping it.
WARNING @ Thu, 22 Apr 2021 18:10:50:
The expression values of the gene SCIP are not present. Skipping it.
INFO @ Thu, 22 Apr 2021 18:10:50:
Gene:POU3F1 TF z-score:0.73 Targets z-score:1.58 Correlation:0.48
WARNING @ Thu, 22 Apr 2021 18:10:50:
The expression values of the gene TST-1 are not present. Skipping it.
WARNING @ Thu, 22 Apr 2021 18:10:50:
The expression values of the gene OCT6 are not present. Skipping it.
WARNING @ Thu, 22 Apr 2021 18:10:50:
The expression values of the gene OTF-6 are not present. Skipping it.
WARNING @ Thu, 22 Apr 2021 18:10:50:
The expression values of the gene OTF6 are not present. Skipping it.
WARNING @ Thu, 22 Apr 2021 18:10:50:
The expression values of the gene OCT-6 are not present. Skipping it.
WARNING @ Thu, 22 Apr 2021 18:10:50:
The expression values of the gene TST1 are not present. Skipping it.
INFO @ Thu, 22 Apr 2021 18:10:50:
All done! Ciao!
INFO @ Thu, 22 Apr 2021 18:10:50:
Test completed successfully```
Thanks Rick!
I will update the documentation accordingly to propose this fix.
I see you already update it :)
What about we propose this as a single line?
conda install haystack_bio pandas==0.21
On Thu, Apr 22, 2021 at 12:24 PM Luca Pinello @.***> wrote:
Thanks Rick!
I will update the documentation accordingly to propose this fix.
This seems to work as well
Great, I think this is an excellent compromise and we don't need to update the recipe.
Thanks for looking into it!
On Thu, Apr 22, 2021 at 1:23 PM Rick Farouni @.***> wrote:
This seems to work as well
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pinellolab/haystack_bio/issues/7#issuecomment-825042976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIH72TWHTQEKCBC7CVSBULTKBLS3ANCNFSM42RWWO2A .