adam icon indicating copy to clipboard operation
adam copied to clipboard

I'm failing to use this package

Open Hoeze opened this issue 4 years ago • 11 comments

Hi, I try to install this package and use it but I'm constantly failing. What I did so far:

  1. set up Anaconda with PySpark 2.4.4
  2. pip install bdgenomics.adam
  3. pyadam
['/opt/anaconda/envs/adam/bin/..', '/opt/anaconda/envs/adam/lib/python3.7/site-packages/bdgenomics/adam']
['/opt/anaconda/envs/adam/bin/..', '/opt/anaconda/envs/adam/lib/python3.7/site-packages/bdgenomics/adam']
ls: cannot access /opt/anaconda/envs/adam/lib/python3.7/site-packages/bdgenomics/adam/adam-python/dist: No such file or directory
Failed to find ADAM egg in /opt/anaconda/envs/adam/lib/python3.7/site-packages/bdgenomics/adam/adam-python/dist.
You need to build ADAM before running this program.

When I try to use the Python API I get the following result:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('abc').getOrCreate()
from bdgenomics.adam.adamContext import ADAMContext
ac = ADAMContext(spark)

Traceback (most recent call last):
  File "/opt/anaconda/envs/adam/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-ade8e609ddf7>", line 5, in <module>
    ac = ADAMContext(spark)
  File "/opt/anaconda/envs/adam/lib/python3.7/site-packages/bdgenomics/adam/adamContext.py", line 57, in __init__
    c = self._jvm.org.bdgenomics.adam.rdd.ADAMContext.ADAMContextFromSession(ss._jsparkSession)
TypeError: 'JavaPackage' object is not callable

Hoeze avatar Oct 10 '19 16:10 Hoeze

Hello @Hoeze! Thank you for submitting this issue.

I assume this is recent, with the most recent release version 0.29.0 of bdgenomics.adam ?

@akmorrow13 Could you also take a look?

heuermh avatar Oct 10 '19 16:10 heuermh

Wow, thanks for the quick reaction :+1: Yes, it's version 0.29

Hoeze avatar Oct 10 '19 16:10 Hoeze

I still need to look into this further, which may be difficult as campus is without power at the moment. Meanwhile, one of the following might work

Try the most recent development version on PyPI, 0.30.0a0

$ pip install bdgenomics.adam==0.30.0a0

https://pypi.org/project/bdgenomics.adam/0.30.0a0/

Try installing ADAM via Bioconda

$ conda install adam

https://bioconda.github.io/recipes/adam/README.html

I believe that a link is not created for pyadam though, so you may need to go looking for where it was installed.

heuermh avatar Oct 10 '19 16:10 heuermh

I began installing with conda install adam, but this resulted in ModuleNotFoundError: bdgenomics.adam not found. Thats why I removed it again and installed the pip version.

EDIT: Installing both conda install adam and pip install bdgenomics.adam==0.30.0a0 the same time still results in a non-working pyadam.

Hoeze avatar Oct 10 '19 16:10 Hoeze

Sorry, I personally don't have that much experience packaging python, and to be honest I get quite confused with regards to conda and pip and virtualenv and such.

I created the conda recipe to install the ADAM command line tools adam-submit and adam-shell from the Maven release distribution tarball, and that works fine. There is still an open issue regarding symlinks due to how conda moves things around when installing (https://github.com/bigdatagenomics/adam/issues/1973), which requires a patch (https://github.com/bioconda/bioconda-recipes/blob/master/recipes/adam/adam-submit.patch).

I am also not sure pyadam works correctly (https://github.com/bigdatagenomics/adam/issues/1973) after this commit to support installation via pip (https://github.com/bigdatagenomics/adam/commit/eba275b2ddc33b072a75a2aa935fcf15865ee1cb).

I've been thinking it might be worth splitting the R and python libraries into separate repositories, so that they can each have their own language-specific build, release process, and packaging. As it is now, our release script doesn't work all the way through, I have to run the JVM and R parts of it peacemeal, and ask @akmorrow13 to build and deploy the python library (https://github.com/bigdatagenomics/adam/blob/master/scripts/release/release.sh#L60).

It might also be worth creating a separate conda recipe that depends on the PyPI package rather than the Maven release distribution tarball, although I don't know what this should be called (python-adam, bdgenomics.adam?), see

https://bioconda.github.io/contributor/guidelines.html#python

heuermh avatar Oct 11 '19 16:10 heuermh

After a bit of experimenting, removing the egg-related lines from pyadam may work with the pip-installed version

$ diff pyadam pyadam2
25d24
< ADAM_EGG=$(${SOURCE_DIR}/find-adam-egg.sh)
36d34
<     --py-files ${ADAM_EGG} \

$ ./pyadam2
Using PYSPARK=/usr/local/bin/pyspark
Python 2.7.16 (default, Sep  2 2019, 11:59:44)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
2019-10-11 11:25:11 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Python version 2.7.16 (default, Sep  2 2019 11:59:44)
SparkSession available as 'spark'.
>>>

heuermh avatar Oct 11 '19 16:10 heuermh

Hello @akmorrow13, might you be able to weigh in on this issue?

heuermh avatar Jan 06 '20 18:01 heuermh

@heuermh I am unsure about how/if the pip installations are dealing with the jars, that was setup before my time. I am unsure of whether pip is installing the jars at all? I can update the conda recipe to work more like Mangos, as the conda recipe currently does not install python modules as all. However, I will have to think more about pip.

akmorrow13 avatar Jan 06 '20 18:01 akmorrow13

Thanks for the quick reply! Before updating the Conda recipe, I would like to try to resolve the problem(s) with the pyadam script. I think #2041 might be a similar issue. Is there any reason to keep this script around? Is there any reason to continue supporting pip?

heuermh avatar Jan 06 '20 18:01 heuermh

Hi all, Adam 1.0 release has same issue. I have to manually comment two lines of setting and finding eggs files in pyadam script. Could you please help on the consequences of commenting out these two lines? If everything goes well with that, why were they included in the previous release and what the purposes were? Thanks!

alartin avatar Nov 07 '22 03:11 alartin

Hello @alartin, unfortunately I am not much more informed on python packaging than in 2019 😉

If you have a workaround that works for you, keep going with it! I'm typically using the adam python libraries from a jupyter or quarto notebook and thus don't use the pyadam shell script.

heuermh avatar Nov 09 '22 17:11 heuermh