ersilia icon indicating copy to clipboard operation
ersilia copied to clipboard

Collect all the steps required to run the model

Open honeyankit opened this issue 2 years ago • 9 comments

Collect steps for to executing the Ersilia model

Exit Criteria: Capture all the steps required for Ersilia to execute the model and generate the prediction values.

honeyankit avatar Oct 11 '22 05:10 honeyankit

Based on the demo video below are the steps used by Ersilia to run the model:

  1. Schedule a job that will pick a model - accomplished by schedule a job using GitHub Actions.
  2. ersilia fetch command will fetch the model from the git repository.
  3. ersilia serve <model id> will pick a model.
  4. ersilia sample <model id> -f input.csv -n 100 will run the model for 100 molecules provided in the input file. (for now molecule will be given with sample file but in future it will be passed as DynamoDB interface)
  5. ersilia api -i <input.csv> file -o <output.csv> file. (for now model result will be stored in output.csv but in the future it will be passed to DynamoDB interface to stored the data)

Note

  • Molecule is the input to the models.
  • Molecule and its prediction values are the output from the table.

honeyankit avatar Oct 11 '22 06:10 honeyankit

@miquelduranfrigola Could you confirm the above steps are correct? Also could you upload a very small input.csv file and output.csv here as well?

Schedule a job that will pick a model - accomplished by schedule a job using GitHub Actions.

Q. From where the models will be picked by GitHub Action? is there any list of model stored in the repositry?

honeyankit avatar Oct 11 '22 06:10 honeyankit

Hi @honeyankit,

Command 4, api, depends on the model. Most of them use a "predict" api, so: ersilia api predict -i <input.csv> -o <output.csv>

I am attaching an example I got with the following commands: ersilia fetch eos2ta5 ersilia serve eos2ta5 ersilia api predict -i input_test.csv -o output_test.csv

If you do not specify the output file it will just print the result in the screen in .json format.

input_test.csv output_test.csv

GemmaTuron avatar Oct 11 '22 06:10 GemmaTuron

Thanks for all of this.

The only command that is missing at the moment in ersilia is ersilia sample. We do have a command called ersilia example which works similarly. The idea of the sample command would be that it would query the existing precalculated data and then provide a sample of molecules that have not yet been calculated. We will be able to provide this functionality as soon as we have a database in AWS, most likely DynamoDb.

As for the APIs - indeed, the API names may vary. The command ersilia api -i input_test.csv (without calculate or predict) is more general, since it automatically selects the available API, without specifying the name.

miquelduranfrigola avatar Oct 11 '22 19:10 miquelduranfrigola

@miquelduranfrigola @GemmaTuron : Are these the models repos thats will be fetched?

honeyankit avatar Oct 11 '22 23:10 honeyankit

Hi @honeyankit !

The model repos are identified by its name, so everything that is eosxxxx is a model that will be fetched

GemmaTuron avatar Oct 12 '22 16:10 GemmaTuron

ersilia api -i <input.csv> file -o <output.csv> file.

Most model have only one api.

honeyankit avatar Oct 13 '22 16:10 honeyankit

Hi @honeyankit. I am now working on the sample command. Currently, it is not available. I will keep you updated.

The sample command should work like this:

ersilia sample --model will return a model identifier. To start with, it will just be random sampling. In the future, we connect this to DynamoDb so that "orphan" models are prioritized. Prioritizing "latest" models will also be an option.

ersilia sample -n 1000 -f input.csv will return 1000 molecules. In the beginning, it will be random sampling. In the future, we will connect this to DynamoDb so that only molecules that have not been precalculated previously are returned.

miquelduranfrigola avatar Oct 13 '22 22:10 miquelduranfrigola

Hi @honeyankit the sample command is now finished. Based on this command, I suggest the following workflow:

MODEL_ID=$(ersilia sample --model)
ersilia fetch $MODEL_ID
ersilia serve $MODEL_ID
ersilia sample -n 100 -f input.csv
ersilia api -i input.csv -o output.csv
ersilia close

The workflow selects a model id. Then, it fetches and serves this model. Once the model is served, it first looks for 100 inputs, and stores them in an input.csv file. Then we run the main API of the model and store calculations in output.csv.

As discussed, the output.csv will eventually be replaced by a push to DynamoDb.

I hope this helps!

miquelduranfrigola avatar Oct 15 '22 08:10 miquelduranfrigola

This task is complete.

honeyankit avatar Oct 31 '22 17:10 honeyankit