maestrowf icon indicating copy to clipboard operation
maestrowf copied to clipboard

Give user ability to access yaml file from generator

Open doutriaux1 opened this issue 4 years ago • 1 comments

When user creates a generator, currently it is not able to access data from the original yaml file.

Also the "variable" from env are parsed first which means we cannot put anything there for the generator to look as it disables the generator parameters.

Currently as a workaround I use sys.argv to locate the original yaml file. There are two reason for this

  1. There is no other way to know the name of the original yaml file
  2. env sticking the name in an env variable, the original yaml file is not copied in OUTPUT_PATH when we get to the generator func

Other issue is that there is a specs verification done on global.parameters which enforces all params to have the same length. I was able to by pass it by filling them with duplicates and using set in my generator.

To make it easy to dev and try to disturb maestro the least possible I went with the following that I propose as a suggestion:

I created a generator.parameters section in the yaml file. Currnetly I use sys.argv to reparse the yaml file and get to this section, but I think having this section available to the user (via env or some other way) would make things really easier for the end user.

Also having the entire parsed content of the yaml file could be an even better solution as the generator might want to know about other things (like scheduling and others)

doutriaux1 avatar Feb 13 '20 16:02 doutriaux1

I'm pasting here my current soltuuin as an example/starting point

YAML FILE

description:
    name: param grid sample test
    description: A sample parameter grid search study

study:
    - name: test_gen
      description: Build the serial version of LULESH.
      run:
          cmd: |
            echo $(TRIAL)_$(SIZE)_$(ITERATION)
          depends: []
generator.parameters:
  SIZE:
    values: [1,2,3,4]
    label: SIZE.%%
  TRIAL:
    values: [5,4]
    label: TRIAL.%%
  ITERATION:
    values: [5,3,6]
    label: ITERATION.%%

PYTHON FILE FOR GENERATOR*

import sys
from maestrowf.datastructures.core import ParameterGenerator
from sklearn.model_selection import ParameterGrid
import yaml
try:
    from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
    from yaml import Loader, Dumper

#

def get_custom_generator(env, **kwargs):
    """
    Create a custom populated ParameterGenerator.

    This function recreates the exact same parameter set as the sample LULESH
    specifications. The point of this file is to present an example of how to
    generate custom parameters.

    :returns: A ParameterGenerator populated with parameters.
    """
    p_gen = ParameterGenerator()
    yml = yaml.load(open(sys.argv[-1]).read(), Loader=Loader)
    print(yml)
    p = {}
    labels = {}
    for k, val in yml["generator.parameters"].items():
        print(k, val)
        if isinstance(val["values"], (list,tuple)):
            p[k] = set(val["values"])
        else:
            p[k] = [val["values"],]
        labels[k] = val["label"]

    grid = ParameterGrid(p)
    p = {}
    for g in grid:
        for k in g:
            if k not in p:
                p[k] = [g[k],]
            else:
                p[k].append(g[k])
    for k, val in p.items():
        p_gen.add_parameter(k, val, labels[k])
    return p_gen

doutriaux1 avatar Feb 13 '20 16:02 doutriaux1