ersilia
ersilia copied to clipboard
🐛 Bug: Unable to predict eos9ei3 due to OSError: Unable to open file (file signature not found)
Describe the bug.
The model "eos9ei3" is unable to be predicted successfully due to OSError: Unable to open file (file signature not found). I had searched the error online and this error message probably suggests the file is either corrupted or not in the HDF5 format.
Describe the steps to reproduce the behavior
The error occurs in the ersilia -v api predict -i "eml_canonical.csv" -o "eos9ei3.csv" > error.log 2>&1
command line.
Expected behavior.
Successfully predict the model
Screenshots.
Operating environment
Arch Linux
Additional context
Please look at the log file for further information. error.log
@ting96haha I suggest you check if you have the input file in the same directory you are working and if the input file is the correct one.
@ting96haha I suggest you check if you have the input file in the same directory you are working and if the input file is the correct one.
Yes, I confirmed it was in the same directory. I tested with the other models and they worked perfectly.
Hi @ting96haha !
Thanks, can I ask you to do a check? There is a command that will generate, in principle, an example input file for the model. This is still work in process, but it's worth a try in this case. That command will give you an example file that you can then use to check the input format and also use it with the predict command to see if it runs ;)
ersilia example eos9ei3 -n 10 -f myexample.csv
-- will give you 10 molecules as example in the "right" format
Hi @ting96haha !
Thanks, can I ask you to do a check? There is a command that will generate, in principle, an example input file for the model. This is still work in process, but it's worth a try in this case. That command will give you an example file that you can then use to check the input format and also use it with the predict command to see if it runs ;)
ersilia example eos9ei3 -n 10 -f myexample.csv
-- will give you 10 molecules as example in the "right" format
Hi @GemmaTuron, thank you for the advice. I used the example file from the ersilia example eos9ei3 -n 10 -f myexample.csv
, and it provides me the example file as follows. It can generate 10 molecules as the input example.
After that, I used the example file as the input file through this command line ersilia -v api predict -i "myexample.csv" -o "eos9ei3.csv" > example_error.log 2>&1
and it produces the error file in the attachment below (example_error.log):
If I check with single molecule (the first entry in the myexample.csv file) from the example file with the command line ersilia -v api predict -i "NCCNS(=O)(=O)c1ccc(Cl)c2ccncc12" -o "eos9ei3.csv" > example_error2.log 2>&1
, the corresponding output error file is provided in the attachment below (example_error2.log):
The error message looks similar as previous approach, which is OSError: Unable to open file (file signature not found). Is there any other way to troubleshoot the model? Thanks for providing me this useful information and I really appreciate it.
Thanks @ting96haha very detailed! I think this model is requiring only .h5 files, which is a format used to store large data... but difficult to read by a human, so we need to improve the input adapter for this... I'm gonna mark it for further checking, you can mark it in purple on excel!
Thanks @ting96haha very detailed! I think this model is requiring only .h5 files, which is a format used to store large data... but difficult to read by a human, so we need to improve the input adapter for this... I'm gonna mark it for further checking, you can mark it in purple on excel!
Thank you for the clarification! I will mark it in purple on excel.
Update:
I am able to fetch and run predictions for eos9ei3 when I use a single molecule as input: ersilia -v api predict -i "CCCC"
, getting the following output:
{
"input": {
"key": "IJDNQMDRQITEOD-UHFFFAOYSA-N",
"input": "CCCC",
"text": "CCCC"
},
"output": {
"outcome": [
1.605723
]
}
}
Passing a .csv file with a list of SMILES works as well, but when I try to pass a list of molecules directly the model cannot process them ersilia -v api predict -i ["CCCC","CCCOC"]
Operating system - WSL on windows 10
Conda version - conda 4.12.0
Python version - Python 3.7.13
Model tested by passing the eml_canonical.csv file
Model fetched successfully Link to colab link to eos9ei3.log
Model predicted successfully Link to colab eos9ei3_output.csv link to CLI eos9ei3.csv
Hi @pauline-banye !
I am reopening this issue because I am unsure if you tried to pass a .csv file (which also worked for me) or a list of smiles (which did not). Can you clarify? Something like: ersilia -v api predict -i ["CCCC","CCCOC"]
Hi @pauline-banye !
I am reopening this issue because I am unsure if you tried to pass a .csv file (which also worked for me) or a list of smiles (which did not). Can you clarify? Something like:
ersilia -v api predict -i ["CCCC","CCCOC"]
Hi @GemmaTuron, good morning from Nigeria, I passed the eml_canonical.csv file.
This is the code I ran to predict the model. ersilia -v api predict -i "eml_canonical.csv" -o "eos9ei3.csv" > eos9ei3-CLI.log 2>&1
.
I was able to fetch, serve and predict successfully in both Colab and the CLI. Would you like me to test it with a list of smiles?
yes please ;)
yes please ;)
Sure, I'm on it! @GemmaTuron
Hi @GemmaTuron, I was able to predict successfully by passing a list of smiles as well. I tested it with ['CCCC', 'CCCOC']
and the first 3 smiles from the eml_canonical.csv.
Steps to pass a list of smiles on colab.
A list of smiles could not be passed directly, I had to assign it to a variable.
- [x] Assigned smile2 as the variable name
smile2 = ['CCCC', 'CCCOC']
- [x] Substituted in the predict code block
output = model.api(input=smile2, output="pandas")
eos9ei3_smile-list.csv eos9ei3_list.csv
Steps to pass a list of smiles on the CLI
- [x] I declared a list and passed it to the predict function
arr=("CCCC" "CCCOC")
- [x] I was able to check that the items declared in the array were accessible using
echo "${arr[@]}"
- [ ] I attempted to pass the list into the predict function directly and by using a for loop
#Passing the variable directly
ersilia -v api predict -i ${arr[@]} -o "eos9ei3-testing.csv"
#for loop to iterate through the elements in the list
for i in arr; do ersilia -v api predict -i ${arr[@]} -o "eos9ei3-testing.csv"; done
However it returned an error Error: Got unexpected extra argument (CCCOC)
- [ ] I attempted other commands but it was only able to predict for the first smile "CCCC" such as
ersilia -v api predict -i ${arr=("CCCC" "CCCOC")} -o "eos9ei3-testing.csv"
.
eos9ei3-testing.csv - [x] Eventually I created a csv file with the smiles.
#create csv file
cat > arr.csv #enter
CCCC
CCCOC
#ctrl C to terminate and save the file
- [x] Then passed the csv to the predict function and it ran successfully.
ersilia -v api predict -i arr -o "eos9ei3-test.csv"
eos9ei3-test.csv
Conclusion
The only way I was able to pass a list of smiles in the CLI was by creating a csv file. Whilst going through the code, I noticed that the input specified in the ersilia API is TEXT. I considered that this might be the reason that only one output is returned when you pass the smiles as a variable or directly in a parenthesis but I'm uncertain because I was able to pass a list successfully in colab.
oh this is great work thanks @pauline-banye !
Definitely we need to improve the input adapters for the CLI. The google colab command model.api()
runs on the python package version of Ersilia which might differ a bit from what is used in the CLI.
Can you share the link to the file you are looking at for the ersilia API?
Thanks
oh this is great work thanks @pauline-banye ! Definitely we need to improve the input adapters for the CLI. The google colab command
model.api()
runs on the python package version of Ersilia which might differ a bit from what is used in the CLI. Can you share the link to the file you are looking at for the ersilia API?Thanks
@GemmaTuron I'm so sorry but I have been searching for the file and I can't seem to find it anywhere 🤦♀️. I can't remember if it was from a file or an error from Colab/the CLI. I would try to replicate it and would provide an update if or when I can.
Hello @miquelduranfrigola
As you mentioned in our last meeting, using the instructions provided in https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/inputs we cannot pass a list of smiles on the CLI, so if the only way to predict more than one molecule is by inputing a .csv file we should make this clear in the documentation.
This model is working - see its github actions run and the model testing on https://github.com/ersilia-os/eos9ei3/issues/6