kepler-model-server Find spec/node_type of Kepler node for model selection

Find spec/node_type of Kepler node for model selection

Open sunya-ch opened this issue 1 year ago • 1 comments

What would you like to be added?

Flow to link Kepler-deploying node specification to model selection from Kepler model DB.

Why is this needed?

Problem description

As previously, we have only a single node_type in the pipeline. We always put _1 after the trainer name to get the model name. However, with SPECPower and AWS instances, we can now train multiple node_type.

Currently, we have a function generate_spec to generate machine spec implemented in python on kepler-model-server.

Idea

The thing to do is to let Kepler determine know its node_type. The logic of generate_spec may not need to merge into inside Kepler. It can run in init container to generate spec and save to a file to mount. Server API may need to update to allow adding machine spec inside the request to select the model.

Note that,

node_type is per pipeline determined by node_type_index.json inside the pipeline folder.
we can set default pipeline to spec_benchmark for acpi value and aws_instance_pipeline for rapl value.

Feb 22 '24 07:02 sunya-ch

kepler-model-server kepler-model-server copied to clipboard

Find spec/node_type of Kepler node for model selection

What would you like to be added?

Why is this needed?

Problem description

Idea

kepler-model-server
kepler-model-server copied to clipboard