ersilia icon indicating copy to clipboard operation
ersilia copied to clipboard

🐛 Bug: Unable to predict eos9ei3 due to OSError: Unable to open file (file signature not found)

Open ting96haha opened this issue 2 years ago • 6 comments

Describe the bug.

The model "eos9ei3" is unable to be predicted successfully due to OSError: Unable to open file (file signature not found). I had searched the error online and this error message probably suggests the file is either corrupted or not in the HDF5 format.

Describe the steps to reproduce the behavior

The error occurs in the ersilia -v api predict -i "eml_canonical.csv" -o "eos9ei3.csv" > error.log 2>&1 command line.

Expected behavior.

Successfully predict the model

Screenshots.

error_h51

Operating environment

Arch Linux

Additional context

Please look at the log file for further information. error.log

ting96haha avatar Oct 20 '22 03:10 ting96haha

@ting96haha I suggest you check if you have the input file in the same directory you are working and if the input file is the correct one.

Zainab-ik avatar Oct 20 '22 08:10 Zainab-ik

@ting96haha I suggest you check if you have the input file in the same directory you are working and if the input file is the correct one.

Yes, I confirmed it was in the same directory. I tested with the other models and they worked perfectly.

ting96haha avatar Oct 20 '22 08:10 ting96haha

Hi @ting96haha !

Thanks, can I ask you to do a check? There is a command that will generate, in principle, an example input file for the model. This is still work in process, but it's worth a try in this case. That command will give you an example file that you can then use to check the input format and also use it with the predict command to see if it runs ;)

ersilia example eos9ei3 -n 10 -f myexample.csv -- will give you 10 molecules as example in the "right" format

GemmaTuron avatar Oct 20 '22 10:10 GemmaTuron

Hi @ting96haha !

Thanks, can I ask you to do a check? There is a command that will generate, in principle, an example input file for the model. This is still work in process, but it's worth a try in this case. That command will give you an example file that you can then use to check the input format and also use it with the predict command to see if it runs ;)

ersilia example eos9ei3 -n 10 -f myexample.csv -- will give you 10 molecules as example in the "right" format

Hi @GemmaTuron, thank you for the advice. I used the example file from the ersilia example eos9ei3 -n 10 -f myexample.csv , and it provides me the example file as follows. It can generate 10 molecules as the input example.

myexample.csv

After that, I used the example file as the input file through this command line ersilia -v api predict -i "myexample.csv" -o "eos9ei3.csv" > example_error.log 2>&1 and it produces the error file in the attachment below (example_error.log):

example_error.log

If I check with single molecule (the first entry in the myexample.csv file) from the example file with the command line ersilia -v api predict -i "NCCNS(=O)(=O)c1ccc(Cl)c2ccncc12" -o "eos9ei3.csv" > example_error2.log 2>&1, the corresponding output error file is provided in the attachment below (example_error2.log):

example_error2.log

The error message looks similar as previous approach, which is OSError: Unable to open file (file signature not found). Is there any other way to troubleshoot the model? Thanks for providing me this useful information and I really appreciate it.

ting96haha avatar Oct 20 '22 11:10 ting96haha

Thanks @ting96haha very detailed! I think this model is requiring only .h5 files, which is a format used to store large data... but difficult to read by a human, so we need to improve the input adapter for this... I'm gonna mark it for further checking, you can mark it in purple on excel!

GemmaTuron avatar Oct 20 '22 12:10 GemmaTuron

Thanks @ting96haha very detailed! I think this model is requiring only .h5 files, which is a format used to store large data... but difficult to read by a human, so we need to improve the input adapter for this... I'm gonna mark it for further checking, you can mark it in purple on excel!

Thank you for the clarification! I will mark it in purple on excel.

ting96haha avatar Oct 20 '22 15:10 ting96haha

Update:

I am able to fetch and run predictions for eos9ei3 when I use a single molecule as input: ersilia -v api predict -i "CCCC", getting the following output: { "input": { "key": "IJDNQMDRQITEOD-UHFFFAOYSA-N", "input": "CCCC", "text": "CCCC" }, "output": { "outcome": [ 1.605723 ] } }

Passing a .csv file with a list of SMILES works as well, but when I try to pass a list of molecules directly the model cannot process them ersilia -v api predict -i ["CCCC","CCCOC"]

GemmaTuron avatar Dec 13 '22 11:12 GemmaTuron

Operating system - WSL on windows 10
Conda version - conda 4.12.0 Python version - Python 3.7.13

Model tested by passing the eml_canonical.csv file

Model fetched successfully Link to colab link to eos9ei3.log

Model predicted successfully Link to colab eos9ei3_output.csv link to CLI eos9ei3.csv

paulinebanye avatar Dec 15 '22 02:12 paulinebanye

Hi @pauline-banye !

I am reopening this issue because I am unsure if you tried to pass a .csv file (which also worked for me) or a list of smiles (which did not). Can you clarify? Something like: ersilia -v api predict -i ["CCCC","CCCOC"]

GemmaTuron avatar Dec 15 '22 07:12 GemmaTuron

Hi @pauline-banye !

I am reopening this issue because I am unsure if you tried to pass a .csv file (which also worked for me) or a list of smiles (which did not). Can you clarify? Something like: ersilia -v api predict -i ["CCCC","CCCOC"]

Hi @GemmaTuron, good morning from Nigeria, I passed the eml_canonical.csv file.

This is the code I ran to predict the model. ersilia -v api predict -i "eml_canonical.csv" -o "eos9ei3.csv" > eos9ei3-CLI.log 2>&1.

I was able to fetch, serve and predict successfully in both Colab and the CLI. Would you like me to test it with a list of smiles?

paulinebanye avatar Dec 15 '22 07:12 paulinebanye

yes please ;)

GemmaTuron avatar Dec 15 '22 07:12 GemmaTuron

yes please ;)

Sure, I'm on it! @GemmaTuron

paulinebanye avatar Dec 15 '22 08:12 paulinebanye

Hi @GemmaTuron, I was able to predict successfully by passing a list of smiles as well. I tested it with ['CCCC', 'CCCOC'] and the first 3 smiles from the eml_canonical.csv.

Steps to pass a list of smiles on colab.

A list of smiles could not be passed directly, I had to assign it to a variable.

  • [x] Assigned smile2 as the variable name smile2 = ['CCCC', 'CCCOC']
  • [x] Substituted in the predict code block output = model.api(input=smile2, output="pandas")

eos9ei3_smile-list.csv eos9ei3_list.csv

Steps to pass a list of smiles on the CLI

  • [x] I declared a list and passed it to the predict function arr=("CCCC" "CCCOC")
  • [x] I was able to check that the items declared in the array were accessible using echo "${arr[@]}"
  • [ ] I attempted to pass the list into the predict function directly and by using a for loop
#Passing the variable directly
ersilia -v api predict -i ${arr[@]} -o "eos9ei3-testing.csv"

#for loop to iterate through the elements in the list
for i in arr; do ersilia -v api predict -i ${arr[@]} -o "eos9ei3-testing.csv"; done

However it returned an error Error: Got unexpected extra argument (CCCOC) list

  • [ ] I attempted other commands but it was only able to predict for the first smile "CCCC" such as ersilia -v api predict -i ${arr=("CCCC" "CCCOC")} -o "eos9ei3-testing.csv".
    eos9ei3-testing.csv
  • [x] Eventually I created a csv file with the smiles.
#create csv file
cat  >  arr.csv #enter
CCCC
CCCOC
#ctrl C to terminate and save the file
  • [x] Then passed the csv to the predict function and it ran successfully. ersilia -v api predict -i arr -o "eos9ei3-test.csv" list2 eos9ei3-test.csv

Conclusion

The only way I was able to pass a list of smiles in the CLI was by creating a csv file. Whilst going through the code, I noticed that the input specified in the ersilia API is TEXT. I considered that this might be the reason that only one output is returned when you pass the smiles as a variable or directly in a parenthesis but I'm uncertain because I was able to pass a list successfully in colab.

paulinebanye avatar Dec 15 '22 08:12 paulinebanye

oh this is great work thanks @pauline-banye ! Definitely we need to improve the input adapters for the CLI. The google colab command model.api() runs on the python package version of Ersilia which might differ a bit from what is used in the CLI. Can you share the link to the file you are looking at for the ersilia API?

Thanks

GemmaTuron avatar Dec 15 '22 14:12 GemmaTuron

oh this is great work thanks @pauline-banye ! Definitely we need to improve the input adapters for the CLI. The google colab command model.api() runs on the python package version of Ersilia which might differ a bit from what is used in the CLI. Can you share the link to the file you are looking at for the ersilia API?

Thanks

@GemmaTuron I'm so sorry but I have been searching for the file and I can't seem to find it anywhere 🤦‍♀️. I can't remember if it was from a file or an error from Colab/the CLI. I would try to replicate it and would provide an update if or when I can.

paulinebanye avatar Dec 15 '22 20:12 paulinebanye

Hello @miquelduranfrigola

As you mentioned in our last meeting, using the instructions provided in https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/inputs we cannot pass a list of smiles on the CLI, so if the only way to predict more than one molecule is by inputing a .csv file we should make this clear in the documentation.

GemmaTuron avatar Dec 19 '22 09:12 GemmaTuron

This model is working - see its github actions run and the model testing on https://github.com/ersilia-os/eos9ei3/issues/6

GemmaTuron avatar Jun 21 '23 07:06 GemmaTuron