ersilia icon indicating copy to clipboard operation
ersilia copied to clipboard

Feature request: eos9be7 needs input adapter for two molecules.

Open Jona-Bvunza opened this issue 2 years ago • 4 comments

Describe the bug.

eos9be7 is fetching successfully in google colab, but not predicting. Its repporting the following massage,

AttributeError Traceback (most recent call last) in 3 model = ErsiliaModel("eos9be7") 4 model.serve() ----> 5 output = model.predict(input=smiles, output="pandas") 6 model.close()

AttributeError: 'ErsiliaModel' object has no attribute 'predict'

Describe the steps to reproduce the behavior

No response

Expected behavior.

No response

Screenshots.

No response

Operating environment

google colab

Additional context

No response

Jona-Bvunza avatar Oct 17 '22 14:10 Jona-Bvunza

Hi @Jona-Bvunza

Quick check, If you run !ersilia serve eos9be7 does it show predict as the api for the model?

GemmaTuron avatar Oct 17 '22 16:10 GemmaTuron

No, eos9be7 has "calculate" as its api. However i have tried replacing output = model.predict(input=smiles, output="pandas") with output = model.calculate(input=smiles, output="pandas") but i am getting an AssertionError with a message below,


AssertionError Traceback (most recent call last) in 3 model = ErsiliaModel("eos9be7") 4 model.serve() ----> 5 output = model.calculate (input=smiles, output="pandas") 6 model.close()

12 frames /usr/local/lib/python3.7/site-packages/ersilia/core/model.py in _method(input, output, batch_size) 124 def _set_api(self, api_name): 125 def _method(input=None, output=None, batch_size=DEFAULT_BATCH_SIZE): --> 126 return self.api(api_name, input, output, batch_size) 127 128 setattr(self, api_name, _method)

/usr/local/lib/python3.7/site-packages/ersilia/core/model.py in api(self, api_name, input, output, batch_size) 327 else: 328 return self.api_task( --> 329 api_name=api_name, input=input, output=output, batch_size=batch_size 330 ) 331

/usr/local/lib/python3.7/site-packages/ersilia/core/model.py in api_task(self, api_name, input, output, batch_size) 334 api_runner = self._get_api_runner(output=output) 335 result = api_runner( --> 336 api=api_instance, input=input, output=output, batch_size=batch_size 337 ) 338 if output is None:

/usr/local/lib/python3.7/site-packages/ersilia/core/model.py in _api_runner_return(self, api, input, output, batch_size) 206 ) 207 for r in self._api_runner_iter( --> 208 api=api, input=input, output=tmp_output, batch_size=batch_size 209 ): 210 continue

/usr/local/lib/python3.7/site-packages/ersilia/core/model.py in _api_runner_iter(self, api, input, output, batch_size) 173 174 def _api_runner_iter(self, api, input, output, batch_size): --> 175 for result in api.post(input=input, output=output, batch_size=batch_size): 176 assert ( 177 result is not None

/usr/local/lib/python3.7/site-packages/ersilia/serve/api.py in post(self, input, output, batch_size) 298 self.logger.debug("Posting to {0}".format(self.api_name)) 299 self.logger.debug("Batch size {0}".format(batch_size)) --> 300 unique_input, mapping = self.unique_input(input) 301 results = {} 302 for res in self.post_unique_input(

/usr/local/lib/python3.7/site-packages/ersilia/serve/api.py in _unique_input(self, input) 275 mapping = collections.defaultdict(list) 276 unique_input = [] --> 277 for i, inp in enumerate(self.input_adapter.adapt_one_by_one(input)): 278 key = inp["key"] 279 if key not in mapping:

/usr/local/lib/python3.7/site-packages/ersilia/io/input.py in adapt_one_by_one(self, inp) 167 168 def adapt_one_by_one(self, inp): --> 169 data = self.adapter.adapt(inp) 170 for d in data: 171 yield d

/usr/local/lib/python3.7/site-packages/ersilia/io/input.py in adapt(self, inp) 139 140 def adapt(self, inp): --> 141 data = self._adapt(inp) 142 data = [self.IO.parse(d) for d in data] 143 return data

/usr/local/lib/python3.7/site-packages/ersilia/io/input.py in _adapt(self, inp) 136 inp = self._try_to_eval(inp) 137 if self._is_python_instance(inp): --> 138 return self._py_input_reader(inp) 139 140 def adapt(self, inp):

/usr/local/lib/python3.7/site-packages/ersilia/io/input.py in _py_input_reader(self, inp) 119 def _py_input_reader(self, inp): 120 reader = PyInputReader(input=inp, IO=self.IO) --> 121 data = reader.read() 122 return data 123

/usr/local/lib/python3.7/site-packages/ersilia/io/readers/pyinput.py in read(self) 58 59 def read(self): ---> 60 if self.is_single_input(): 61 return [self._data] 62 else:

/usr/local/lib/python3.7/site-packages/ersilia/io/readers/pyinput.py in is_single_input(self) 43 if type(one_element) is tuple: 44 one_element = list(one_element) ---> 45 assert type(one_element) is list 46 one_inner_element = one_element[0] 47 if type(one_inner_element) is tuple:

AssertionError:

Jona-Bvunza avatar Oct 18 '22 07:10 Jona-Bvunza

Hi @Jona-Bvunza

If you go to the Ersilia Model Hub Website and search for the model eos9be7 you will see the following explanation: This model calculates a novel distance measure between two molecules, the Fréchet ChemNet distance (FCD) Therefore, you need TWO molecules as input. This means we will also need to improve the input adapters in the python package so that they accept two lists of molecules not one.

Please leave this issue as open and change the title to eos9be7 needs input adapter for two molecules Mark this task in the excel as complete and move on ;)

GemmaTuron avatar Oct 18 '22 08:10 GemmaTuron

Ok, noted.

Jona-Bvunza avatar Oct 18 '22 08:10 Jona-Bvunza

Hi @miquelduranfrigola and @brosular

I am unable to pass molecules in the CLI for that model in any of the formats specified in https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/inputs Can you please have a look and get back to me?

GemmaTuron avatar Dec 19 '22 09:12 GemmaTuron

Status of the Implementation of Model eos9be7:

On CLI: While testing this model earlier, I was unable to fetch it, because of a few repetitive errors, due to the Python Version Incompatibilities, with my MacOS system. Now, I have resolved that error and I was able to fetch it successfully, using CLI. I did so by updating my system, re-installing a few packages, as well as installing the Python Version 3.7, in a new virtual environment. I had followed a few articles and documentations for the same, and I was able to successfully resolve it. I was getting a few connection errors because of some network connection issues, but I have been able to resolve the major issue, which was causing errors, upon running this model on my system.

Regarding the above issue:

I tried to run this model again, on Google Colab. I was able to fetch and serve it, successfully:

100% 8/8 [08:53<00:00, 66.63s/it] Fetching eos9be7 done in time: 0:08:53.035870s 👍 Model eos9be7 fetched successfully! Time taken: 534.71 seconds

Upon serving the Model, it shows Calculate API, as the Model API. When I try to run the actual model, I get the following error:


AssertionError Traceback (most recent call last) in 7 model = ErsiliaModel(model_name) 8 begin = time.time() ----> 9 output = model.api(input=smiles, output="pandas") 10 end = time.time() 11

11 frames /usr/local/lib/python3.7/site-packages/ersilia/io/readers/pyinput.py in is_single_input(self) 43 if type(one_element) is tuple: 44 one_element = list(one_element) ---> 45 assert type(one_element) is list 46 one_inner_element = one_element[0] 47 if type(one_inner_element) is tuple:


Running the Model using the Calculate API, gives the same error, that it was giving initially. While working on this issue earlier, I had researched more about this model, by searching for it, on the Ersilia Model Hub Website. Now, I went through it again, It indicates that this Model's Characteristics (as mentioned in the model's repository) require the Input to be a Compound and the Input shape to be a Pair of Lists of Molecules, in order to measure the Chemnet-Distance, between two molecules. Since, This is a model using two sets of molecules and returning an overall single output (one float number) between the two sets, This error will be resolved if we we use a Compound Pair of Lists of Molecules in CSV Format, given as one single input. (such as the example given for the Model Inputs, in the Ersilia GitBook).

Testing this Model, requires the input to be in this kind of a format (for CSV), for it to run successfully:

smiles_1,smiles_2 CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O,CC(=O)OC1=CC=CC=C1C(=O)O C1=CN=CC=C1C(=O)NN,CC(C)CC1=CC=C(C=C1)C(C)C(=O)O CC(CN1C=NC2=C(N=CN=C21)N)OCP(=O)(O)O,CC1(OC2C(OC(C2O1)(C#N)C3=CC=C4N3N=CN=C4N)CO)C ,COC1=CC23CCCN2CCC4=CC5=C(C=C4C3C1O)OCO5

The main source of error, because of which the Model isn't able to run successfully, on CLI and Colab is the Input Shape as well as the Number of Inputs, being provided for the Model.

girishatechie avatar Mar 27 '23 19:03 girishatechie

Thanks @girishatechie for this thorough comment and review, very helpful! if you pass the csv file that you give an example, does the model function or it throws the same error?

GemmaTuron avatar Mar 28 '23 11:03 GemmaTuron

Thank you! :) @GemmaTuron

I tried to pass the CSV File, that I had given as an example above, which has a compound pair of list of molecules, that can be given as one single input, to the Model. Here's the file: smiles_testing.csv

However, while running predictions, the model throws a similar Assertion Error at the same line, indicating that the expression returns false, while testing. According to me, this is because we need to update the code to accept 2 SMILES Columns, as a single input. It won't work with the same template, because when I passed this new file as input, instead of the previous one, it was able to successfully accept it and was even able to extract the smiles, from both the columns, i.e smiles_1 and smiles_2 in the file, after making minor code changes. However, while running the actual predictions, since at that step, we're only passing a single SMILES Column as input, hence it isn't able to run successfully and find out the output, when I pass the new .CSV file, as input. The new file contains 2 SMILES Columns, which need to be passed as a single input and this isn't compatible with the current format.

Hence, The expression evaluates to false at this particular step: output = model.api(input=smiles, output="pandas")

girishatechie avatar Mar 28 '23 17:03 girishatechie

Can you paste a log with the whole error you are getting?

GemmaTuron avatar Mar 28 '23 17:03 GemmaTuron

The logging module doesn't generally function properly with Colab, hence it's not able to redirect everything to a log file, when I try to import the logging module and execute it. Is it fine if I paste the whole error along with the input commands?

girishatechie avatar Mar 28 '23 18:03 girishatechie

yeah, you can copy the printed error and paste it into a .txt file manually for example

GemmaTuron avatar Mar 28 '23 18:03 GemmaTuron

Fine, I'll just paste it manually into a .txt file and attach it here

girishatechie avatar Mar 28 '23 18:03 girishatechie

thanks @girishatechie very helpful!

GemmaTuron avatar Mar 28 '23 18:03 GemmaTuron

Thank you! :) Here's the error file for the above step (upon running predictions): @GemmaTuron model_testing.txt

girishatechie avatar Mar 28 '23 18:03 girishatechie

thanks, indeed we need to modify the input function to accept the pair of molecules - I cant tackle it right now but I'll leave this issue open

GemmaTuron avatar Mar 28 '23 18:03 GemmaTuron

This has been solved by @miquelduranfrigola and @ZakiaYahya

GemmaTuron avatar Aug 10 '23 18:08 GemmaTuron