ersilia icon indicating copy to clipboard operation
ersilia copied to clipboard

🐛 Bug: eos3ae7 repeatedly fails to fetch

Open Cee-tech21 opened this issue 2 years ago • 8 comments

Describe the bug.

fetching of eos3ae7 repeatedly fails with the following error message logged:

"Model API eos3ae7:predict did not produce an output"

Describe the steps to reproduce the behavior

Run the following command: ersilia -v fetch eos3ae7 | tee -a eos3ae7_fetch.log 2>&1

Expected behavior.

After running the "fetch" command, the model eos3ae7 is meant to be downloaded from remote repository to local computer.

Screenshots.

eos3ae7_fetch.log

Operating environment

Linux Mint 19

Additional context

No response

Cee-tech21 avatar Oct 15 '22 01:10 Cee-tech21

@Cee-tech21 try it again while the internet is connected. Sometimes, could be due to internet break.

Zainab-ik avatar Oct 15 '22 05:10 Zainab-ik

Hi @Cee-tech21 !

The model is working in my linux machine. I think the |tee command is not saving all the error log as we need it, so I can't see what's going on. Please try again and save directly without the |tee -a I have seen it in #355 and #344

GemmaTuron avatar Oct 16 '22 16:10 GemmaTuron

Hi @Cee-tech21 ! Was the error corrected ?, I have similar issues with my models

Jona-Bvunza avatar Oct 17 '22 07:10 Jona-Bvunza

Hi @Cee-tech21 ! Was the error corrected ?, I have similar issues with my models

Hello @Jona-Bvunza , please check the Slack channel for a more in depth explanation, the fact that you get an "empty output error" might come from a very different issue, so please open your own issue and paste the log file

GemmaTuron avatar Oct 17 '22 10:10 GemmaTuron

Hi @Cee-tech21 !

The model is working in my linux machine. I think the |tee command is not saving all the error log as we need it, so I can't see what's going on. Please try again and save directly without the |tee -a I have seen it in #355 and #344

I am seeing that this model is also using the sqlalchemy package, maybe linked to what we are seeing in #338. @Cee-tech21 can you do the same test as @femme-js? (check the version in the conda environment of the model, and try to run the model in colab)

@miquelduranfrigola do you think the problem might be in the sqlalchemy versions?

GemmaTuron avatar Oct 17 '22 10:10 GemmaTuron

Model has now been run on google colab but the same error noted in this issue is witnessed in google colab. chizi_e_cee-tech

Cee-tech21 avatar Oct 18 '22 02:10 Cee-tech21

Since this model fails to fetch both on my local computer and on colab, I intend closing this issue with the presumption/conclusion that there's a problem preventing the model from being fetched.

Cee-tech21 avatar Oct 18 '22 04:10 Cee-tech21

@Cee-tech21 I can reproduce the same error, I need some time to check what can be the issue. Can you please leave the issue open but change title to "eos3ae7 fails at fetching time" I will add some tags to help us locate it. Mark the issue on excel and move on!

GemmaTuron avatar Oct 18 '22 08:10 GemmaTuron

@GemmaTuron what is the current status of this?

miquelduranfrigola avatar Nov 21 '22 19:11 miquelduranfrigola

Hi all! Fetching this model here. Fetch still fails. Will update this post once fetch is successful.

Cee-tech21 avatar Nov 21 '22 19:11 Cee-tech21

@GemmaTuron what is the current status of this?

Hello @miquelduranfrigola

The issues tagged with "help wanted" and "model-bug" are models that consistenly encountered problems at fetch time. We will work with the Outreachy interns during the internship period in making sure they run consistently.

@Cee-tech21 let us know if you are trying again, thanks

GemmaTuron avatar Nov 21 '22 19:11 GemmaTuron

Hi @GemmaTuron, I have just tried to fetch model "eos3ae7" again. Fetching of eos3ae7 still fails.

Cee-tech21 avatar Nov 21 '22 19:11 Cee-tech21

Thanks @Cee-tech21 - we are compiling a list of problematic models and we will address them in one batch before Christmas. Will keep you posted.

miquelduranfrigola avatar Dec 03 '22 16:12 miquelduranfrigola

Hi @GemmaTuron @miquelduranfrigola

This model fails to fetch using the CLI and colab. It returns an EmptyOutputError

System

Windows 10

Conda version

conda 22.9.0

Pip version

pip 22.3.1

Python version

Python 3.7.13

SQLAlchemy version

Version: 1.3.24

Steps to reproduce the behavior

ersilia -v fetch eos3ae7 > eos3ae7.log 2>&1

error on CLI

error log - eos3ae7.log

error on colab

###Attempts to resolve the error Based on similar errors,

  • [x] Reinstalled git LFS
  • [x] Reinstalled git CLI
  • [x] Updated pip
  • [x] Currently updating conda

paulinebanye avatar Dec 07 '22 10:12 paulinebanye

Just a quick update regarding the status of this model. I continued working on #343 started on #369 as they both return the same EmptyOutputError but I came across issues with the dependencies.

  • [x] Cleared the tmp file.
  • [x] Updated Ubuntu
  • [x] Reinstalled & updated conda
  • [x] Cloned the isaura repo again
  • [x] Cloned the Ersilia repo again
  • [x] Reinstalled dependencies but encountered errors with the version of sqlalchemy and bentoml. However updating or downgrading the versions leads to dependency conflicts with other installs i.e. isaura

Sqlalchemy sqlalchemy error

Bentoml bentoml error

paulinebanye avatar Dec 08 '22 20:12 paulinebanye

Hi @pauline-banye If you can paste the full error logs here it would be helpful. I assume other models work fine on your system? Are you on a WSL or a Ubuntu machine? Thanks!

GemmaTuron avatar Dec 09 '22 09:12 GemmaTuron

Hi @pauline-banye If you can paste the full error logs here it would be helpful. I assume other models work fine on your system? Are you on a WSL or a Ubuntu machine? Thanks!

Hi @GemmaTuron I am on a WSL machine. I have tested 3 of the models with issues eos3ae7, eos4tccc and eos1579 eos3ae7.log eos4tcc.log

I'm in the process of testing the models on colab as well. So far I have tested eos4tcc on colab and it returns an EmptyOutputError as well.

I would update you once I have tested the other models on colab

paulinebanye avatar Dec 09 '22 10:12 paulinebanye

Update @GemmaTuron @miquelduranfrigola. I was able to resolve the issue with my system not fetching any model.

Steps I took were:

  • [x] Uninstalled and reinstalled ubuntu 20.04
  • [x] Installed pip 22.3.1
  • [x] Downgraded to python 3.7.15
  • [x] Installed anaconda 4.14.
  • [x] Cloned the Ersilia repo again
  • [x] Did not clone Isaura

I fetched the model multiple times and encountered errors relating to dependencies on different ocassions "no module named pandas", "no module named keras", "no module named tensorflow". Which was resolved by running:

  • "pip install pandas"
  • "pip install keras"
  • "pip install tensorflow"

The current error returned is ModuleNotFoundError: No module named 'keras.layers.recurrent' which I tried to resolve with pip install keras.layers.recurrent.

keras

eos3ae7.log

paulinebanye avatar Dec 11 '22 13:12 paulinebanye

Many thanks, @pauline-banye. This is extremely helpful and I really appreciate the great reporting. This looks like an issue related to Isaura, which now uses poetry to manage dependencies. I am testing it today and will keep you updated.

miquelduranfrigola avatar Dec 11 '22 16:12 miquelduranfrigola

Many thanks, @pauline-banye. This is extremely helpful and I really appreciate the great reporting. This looks like an issue related to Isaura, which now uses poetry to manage dependencies. I am testing it today and will keep you updated.

Thank you @miquelduranfrigola 😊. It would be updating the reports on the other two models I tested as well.

paulinebanye avatar Dec 11 '22 22:12 paulinebanye

Hi,

Hoping to bring some extra information on this issue. I have installed WSL in my windows machine to make sure I can reproduce @pauline-banye settings. I have taken special care to ensure that the python path is set to the Anaconda python, so conda environments should be directed to the right place. Just to be clear, there is no Python installed outside Conda in the WSL system -- this could be a source problem, though it shouldn't

When I run $ echo -e ${PATH//:/\\n} the first lines are: /home/gturon/anaconda3/condabin /home/gturon/.vscode-server/bin/5235c6bb189b60b01b1f49062f4ffa42384f8c91/bin/remote-cli /usr/local/sbin /usr/local/bin

When fetching the model eos3ae7, I get the following error:

Detailed error: Model API eos3ae7:predict did not produce an outputTraceback (most recent call last): File "/home/gturon/eos/repository/eos3ae7/20221212224955_5D39E0/eos3ae7/artifacts/framework/code/main.py", line 7, in import pandas as pd ModuleNotFoundError: No module named 'pandas'

So, pandas is not found, but when I do: $ conda activate eos3ae7 $ conda list I find pandas installed (version 1.3.5. ) The package is imported without problems, so it IS in the environment. For eos4tcc, is basically the same but the module not found is joblib (which again, IS in the conda environment, version 1.1.0) This is suspiciously similar to the issue we were encountering in Google Colab when the pythonpath was not properly set, as @carcablop identified.

GemmaTuron avatar Dec 12 '22 22:12 GemmaTuron

Hello everyone! Great job!!! Quick update here!!

I tried again to fetch model eos3ae7 using google colab but I'm getting the error message below after the fetch code executes for around 10 minutes:

Detailed error: Model API eos3ae7:predict did not produce an outputTraceback (most recent call last): File "/root/eos/repository/eos3ae7/20221215160804_62CE4B/eos3ae7/artifacts/framework/code/main.py", line 7, in import pandas as pd ModuleNotFoundError: No module named 'pandas'

pandas related error message should not be showing as pandas was successfully imported and successfully called before issuing the fetch command. Have a look at colab link...

https://colab.research.google.com/drive/1I4pmrDjXS_XXwRRWyTSI-Kf5m76SXPR9?usp=sharing

Cee-tech21 avatar Dec 15 '22 16:12 Cee-tech21

I've been checking if the latest updates on the pythonpaths https://github.com/ersilia-os/ersilia/commit/70bcf5469d912b86b469a3db9e2978f34ff7a1fe would solve this issue but it seems we still lack some packages, in this latest test (in colab): "yaml"

GemmaTuron avatar Jan 16 '23 16:01 GemmaTuron

And the latest updates we did to the pythonpaths seem to be breaking the code somewhere else on the CLI (see log file attached) eos3ae7.txt

GemmaTuron avatar Jan 16 '23 16:01 GemmaTuron

Run the model in WSL2 (using Ubuntu 20.04.5) and I get the same error of package not found but in this case it is "yaml". I have confirmed 'yaml' is not installed in the eos3ae7 env but pandas is . Tried to to install it manually but the model didn't work.

Model API eos3ae7:predict did not produce an outputTraceback (most recent call last):
  File "/home/samuelmayna/eos/repository/eos3ae7/20230328090007_475C16/eos3ae7/artifacts/framework/code/main.py", line 10, in <module>
    from chemvae.vae_utils import VAEUtils
  File "/home/samuelmayna/eos/repository/eos3ae7/20230328090007_475C16/eos3ae7/artifacts/framework/code/chemvae/vae_utils.py", line 4, in <module>
    import yaml
ModuleNotFoundError: No module named 'yaml'

More Error logs can be found at eos3ae7_fetch.log

samuelmaina avatar Mar 28 '23 13:03 samuelmaina

Hi @samuelmaina

If you clone the repository to your local system, and modify their installation requirements to add the yaml package, does it work? You then need to call the model using the --repo_path <path_to_cloned_repo> flag at the end of the fetch command

GemmaTuron avatar Mar 28 '23 15:03 GemmaTuron

@GemmaTuron Pandas is not detected in the remote repo. Added pandas and pyyaml(also tried with PyYALM) to the Dockerfile so that they are installed. dockerfile_change I got pandas not installed error. pandas_error

Pandas was not in the eos3ae7 env but ruamel-yaml was. Looked at script.sh generated to run the installation command from the line Running bash /tmp/ersilia-1k_bwc4b/script.sh > /tmp/ersilia-_wtlkhjr/command_outputs.log 2>&1. After running all the installation commands the script was downloading code from https://github.com/ersilia-os/bentoml-ersilia. I looked at the setup.py setup.py and found that Yaml that is in "required include" is the "ruamel.yaml" which is incompatible with import yaml, it is used as

    from ruamel.yaml import YAML

    yaml=YAML(typ='safe')   # default, if not specfied, is 'rt' (round-trip)
    yaml.load(doc)

as seen from here. The required yaml is pyyaml .My guess is that there is some automated workflows that are uninstalling pandas .

samuelmaina avatar Mar 29 '23 09:03 samuelmaina

I have tested the model with one conda-forge(I had two in the dockerfile in the previous comment) and the results are the same.

samuelmaina avatar Mar 29 '23 10:03 samuelmaina

Hi @samuelmaina !

Thanks, that is a very good catch! I'll need to see why are we using ruamel.yaml in bentoml --- maybe it will be easier to change the pyyaml to ruamel.yaml in the model itself, since the bento-ml package is used by all ersilia models ? What do you think? I need some time to think about it, but your work has been great to point us in the right direction, many thanks

GemmaTuron avatar Mar 29 '23 12:03 GemmaTuron

I am really grateful,. I think its a good idea to install pyyalm for the local model, no need to break the others. Migrating to pyyaml would be hectic but you can consult.

samuelmaina avatar Mar 29 '23 16:03 samuelmaina