sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Hyperparameter values forcefully converted to strings, thus unable to pass a list

Open mmamaev opened this issue 6 years ago • 23 comments

I want ro pass a list as a hyperparameter to an estimator. For example, hyperparameter key = 'a' hyperparameter value = ['def', 'xyz']

This type of value is accepted and get verified:

>>> estimator.set_hyperparameters(a=['def', 'xyz'])
>>> estimator.hyperparameters()['a']
['def', 'xyz']
>>> type(estimator.hyperparameters()['a'])
<class 'list'>

I expect that sagemaker renders hyperparameters to the following json to be found inside /opt/ml/input/config/hyperparameters.json:

>>> print(json.dumps(estimator.hyperparameters()))
{"a": ["def", "xyz"]}

Instead, the contents of /opt/ml/input/config/hyperparameters.json is:

{"a": "['def', 'xyz']"}

The list type of the value is lost, being forcefully converted to a string by this code:

https://github.com/aws/sagemaker-python-sdk/blob/1c94079ddf1395d2edd82b5be38e989aabf378d8/src/sagemaker/estimator.py#L553-L554

mmamaev avatar Jan 25 '19 16:01 mmamaev

Hi @mmamaev ,

Although the list is encoded to string, you can decode them back to list when using it. For example, if you are using SageMaker's framework container, you can add the decode logic in the user script before using the list.

So my question is, could you provide more details on this issue:

  1. What are you trying to do? Are you using your own container, SageMaker's framework container or SageMaker's first party algorithm container?
  2. Do you find decoding the hyperparameter from string to list hard? If so, what's the problem?

More details will help us understand the issue and may impact our decision whether to provide instruction for this case or make new feature as what you suggested (make the list automatically decoded in the hyperparameter json file).

Thanks!

yangaws avatar Jan 25 '19 22:01 yangaws

Good morning, @yangaws,

Yes, I am using my own container. No, I do not find it difficult to decode a string back to list... if I know that I have to do it.

The problem is that sagemaker changes the type of the passed value when it is not expected to do so. The sagemaker doc reads:

hyperparameters()
    Return the hyperparameters as a dictionary to use for training.
    
    The fit() method, which trains the model, calls this method to find the hyperparameters.

        Returns:        The hyperparameters.
        Return type:  dict[str, str]

Indeed, it mentions that the return type is dict[str, str]. However, this notice of the return type is present only for EstimatorBase.hyperparameters() and is absent for Estimator.hyperparameters().

Moreover, the Estimator.hyperparameters() method returns the dict with value types as they've been passed to Estimator such as lists, numbers, etc.

>>> hp = {'a': ['bb', 'cc'], 'x': 1.5, 125: False}
>>> estimator = sm.estimator.Estimator(image,
...                        role, 1, 'ml.c4.2xlarge',
...                        output_path="s3://{}/output".format(session.default_bucket()),
...                        sagemaker_session=session,
...                        hyperparameters=hp)
>>>
>>> estimator.hyperparameters()
{'a': ['bb', 'cc'], 'x': 1.5, 125: False}

So one is ok to assume that this is a "dictionary to use for training".

Errors that can be introduced could be quite nasty. For example, the list ['bb', 'cc'] and the string "['bb', 'cc']" respond to the same calls, such as len() or __getitem__(), but with quite different results.

My understanding is that you're being compliant to the following specification:

   "HyperParameters": { 
      "string" : "string" 
   }

(AWS Documentation - Amazon SageMaker - Developer Guide - API Reference - Actions - Amazon SageMaker Service - CreateTrainingJob)

Forcing hyperparameter values to be strings looks rather unwarranted. I am interested what is the reason behind it. To work around, I have to implement additional checks to differentiate between reading "conventional" json and "sagemaker-style" json, doing additional decoding for the latter. I see this as an unnecessary complication.

If this restriction cannot be lifted, I suggest that

  • the sagemaker doc explicitly and consistently states that all hyperparameter values (as well as keys) will be converted to strings;
  • Estimator.hyperparameters() returns the dictionary with keys and values converted to strings (as it's probably supposed to do).

mmamaev avatar Jan 26 '19 09:01 mmamaev

Hi @mmamaev ,

Got what you mean. We need to talk whether this can be a feature change in our backlog (we make the hyperparameter as list in the json). But anyway our doc for now needs to be updated, let me add a doc label for now.

Thanks for the explanation.

yangaws avatar Jan 29 '19 19:01 yangaws

Hey!

This is something that has being a pain for us too.

Do you find decoding the hyperparameter from string to list hard? If so, what's the problem?

Yes. The right solution to adapt ourselves to sagemaker's contract with hyperparameters is: keeping track of all possible key/values and their types in a custom image, just to parse it back to their expected format. And undoing the str() applied by sagemaker, casting every hyperparameter that is not a string is quite time consuming.

This is not easy to do when we have to send dozens of hyperparameters. Also, it is quite strange to send something like

{
    "objective": "binary",
    "boosting_type": "gbdt",
    "num_leaves": 31,
    "metric": ["binary_logloss", "auc"],
    "seed": None
} 

and getting inside of the container something like:

{
    "objective": "binary",
    "boosting_type": "gbdt",
    "num_leaves": "31",
    "metric": "[\"binary_logloss\", \"auc\"]",
    "seed": "None"
} 

Is there any chance of changing how sagemaker deals with hyperparameters?

I imagine that the decision of forcing an str() conversion was taken to prevent having to deal with json encoders or it's more related to some concern with safety. Therefore, letting us control the json encoding and decoding and expecting just a json string as an argument of Estimator would help both sides?

roelschr avatar Apr 10 '19 17:04 roelschr

we are facing this issue as well. And in our case it is nested dict (we use our customer docker images). its very hard to reparse this to json because we have different types of values.

you guys also convert double quotes to single quotes, so to effectively get this to work, i have to do this

param_grid = {
            "svd__n_components": [ json.loads(training_params["svd"].replace("\'", "\""))["n_components"]],
            "xgb__eval_set": [(X_test, y_test)],
        }

sandys avatar Jun 16 '19 18:06 sandys

Hi team, for those still having trouble here, I've been using these two functions to parse out the hyperparameters in my own estimator containers (it also strips leading 0's, just in case you work with coordinates in your day to day, which can cause trouble when trying to cast to an int or float):

def destringify_dict_values(d):
    return {k: destringify(v) for k, v in d.items()}

def destringify(s):

    if s == len(s) * "0":
        return 0
    else:
        s = s.lstrip("0")

    if isinstance(s, str):
        try:
            val = literal_eval(s)
        except ValueError:
            val = s
    else:
        val = s

    if isinstance(val, float):
        if val.is_integer():
            return int(val)
        return val

    return val

It uses literal_eval from ast, which is not ideal, but still much safer than eval , since it raises a ValueError if it can't cast to a basic Python type.

nhirons avatar Jul 09 '19 06:07 nhirons

+1 I wasted a whole day of my life trying to debug this...training the docker container worked locally, but on sagemaker it did not! Truly an insidious issue.

ARDivekar avatar Jan 07 '20 17:01 ARDivekar

This issue is also being a pain for us too. Hopefully you can provide soon a way to handle lists as hyperparams

vcodina avatar Jan 20 '20 15:01 vcodina

I'm having the issue too. From what I gathered, there has been no official response to this thread for more than a year now, quite baffling.

Furthermore, trying to find a workaround to @mmamaev 's situation, I found something maybe more insidious. This is the script, named sandbox.py, that I try to run:

import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-a', nargs='*')
    args, _ = parser.parse_known_args()
    print(args)

On my laptop, when I run python sandbox.py -a def xyz I get the expected print: Namespace(a=['def', 'xyz']). So args.a is the correct expected list: ['def', 'xyz'].

When I run the following on sagemaker it fails:

role = get_execution_role()
image_name = ...
hyperparameters={'a': 'def xyz'
                }
output_path = ...
estimator = Estimator(image_name=image_name,
                    role=role,
                    train_instance_count=1,
                    output_path=output_path,
                    train_instance_type='local',
                    hyperparameters=hyperparameters)

estimator.fit()

where the image image_name has been built using this Dockerfile:

FROM tensorflow/tensorflow:2.0.0-py3

RUN pip install --no-cache-dir --upgrade pip
RUN pip install --no-cache-dir sagemaker-containers

COPY sandbox.py /opt/ml/code/
ENV SAGEMAKER_PROGRAM sandbox.py

Here is the end of the logs of the execution:

algo-1-7blkd_1  | SM_USER_ARGS=["-a","def xyz"]
algo-1-7blkd_1  | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
algo-1-7blkd_1  | SM_HP_A=def xyz
algo-1-7blkd_1  | PYTHONPATH=/opt/ml/code:/usr/local/bin:/usr/lib/python36.zip:/usr/lib/python3.6:/usr/lib/python3.6/lib-dynload:/usr/local/lib/python3.6/dist-packages:/usr/lib/python3/dist-packages
algo-1-7blkd_1  | 
algo-1-7blkd_1  | Invoking script with the following command:
algo-1-7blkd_1  | 
algo-1-7blkd_1  | /usr/bin/python3 sandbox.py -a def xyz
algo-1-7blkd_1  | 
algo-1-7blkd_1  | 
algo-1-7blkd_1  | Namespace(a=['def xyz'])
algo-1-7blkd_1  | 2020-02-21 13:08:04,897 sagemaker-containers INFO     Reporting training SUCCESS
tmpzd7en6gx_algo-1-7blkd_1 exited with code 0
Aborting on container exit...
===== Job Complete =====

As you see, here args.a is a list of 1 element, that is args.a=['def xyz']. Whereas sagemaker says that it called the exact same line that I did on my laptop, that is, /usr/bin/python3 sandbox.py -a def xyz. Did sagemaker straight up lie to me and somehow escaped the space between def and xyz without saying ?

So my workaround does not work and I have to rely on a painful hyperparameter decoding. @ihoelscher did a great job explaining why this should not be.

Any news @yangaws ?

Edit: in OP's case the following parameter decoding makes the code works (with 'a': 'def xyz') both on my laptop and on sagemaker: just add these two lines after args, _ = parser.parse_known_args()

    args.a = [s.split() for s in args.a]
    args.a = list(itertools.chain.from_iterable(args.a))

itertools needs to be imported.

durandg12 avatar Feb 21 '20 13:02 durandg12

What is the status on this issue? I believe the relevant changes were not included on v2.0.0. @laurenyu

diegodebrito avatar Dec 01 '20 14:12 diegodebrito

@diegodebrito no, it wasn't included in v2.0.0 unfortunately

cc @ajaykarpur, who might know about the current status of this issue

laurenyu avatar Dec 01 '20 22:12 laurenyu

Glad to create a PR for this immediately. I can't believe this has not been addressed in nearly a year.

theoldfather avatar Dec 15 '20 22:12 theoldfather

Hi

Currently I'm doing

https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model.html

I'm in: "Step 4: Train a Model"

And in the "3. Set the hyperparameters for the XGBoost algorithm by calling the set_hyperparameters method of the estimator."

I'm having a problem after executing: xgb_model.fit({"train": train_input, "validation": validation_input}, wait=True)

After going through some documentation: https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/

https://xgboost.readthedocs.io/en/latest/parameter.html

I see that the list of available options in objective

This is how I have it right now 1:

xgb_model.set_hyperparameters(
    max_depth = 5,
    eta = 0.2,
    gamma = 4,
    min_child_weight = 6,
    subsample = 0.7,
    objective = "binary:logistic",
    num_round = 1000
)

I've tried to use

2:

...
 objective = {"binary": "logistic"},
 ...

3:

...
 objective = ['binary: logistic'],
 ...

The error output varies but it's always like this:

  1. Failed to parse hyperparameter objective value binary:logistic to Json.

hyperparameter_validation.py", line 47, in validate_range raise exc.UserError("Hyperparameter {}: {} is not in {}".format(self.name, value, self.range)) sagemaker_algorithm_toolkit.exceptions.UserError: Hyperparameter objective: {'binary:logistic'} is not in ['aft_loss_distribution', 'binary:logistic', 'binary:logitraw', 'binary:hinge', 'count:poisson', 'multi:softmax', 'multi:softprob', 'rank:pairwise', 'rank:ndcg', 'rank:map', 'reg:linear', 'reg:squarederror', 'reg:logistic', 'reg:gamma', 'reg:pseudohubererror', 'reg:squaredlogerror', 'reg:tweedie', 'survival:aft', 'survival:cox']

Do you guys now if I'm doing something wrong? I would think the tutorial is a no brainer but maybe I'm missing something

xjwalker avatar Mar 04 '21 02:03 xjwalker

Hi

Currently I'm doing

https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model.html

I'm in: "Step 4: Train a Model"

And in the "3. Set the hyperparameters for the XGBoost algorithm by calling the set_hyperparameters method of the estimator."

I'm having a problem after executing: xgb_model.fit({"train": train_input, "validation": validation_input}, wait=True) ...

@xjwalker I had the same error message. Mine was caused by a wrong train-val data format. I forgot to concatenate the label column to the dataframe. IMDB Sentiment Analysis XGBoost (Batch Transform).ipynb

# TODO: Save the training and validation data to train.csv and validation.csv in the data_dir directory.
#       Make sure that the files you create are in the correct format.
# Note: the first column is the label
pd.concat([train_y, train_X], axis=1).to_csv(os.path.join(data_dir, 'train.csv'), header=False, index=False)
pd.concat([val_y, val_X], axis=1).to_csv(os.path.join(data_dir, 'validation.csv'), header=False, index=False)```

nov05 avatar Apr 01 '21 02:04 nov05

Nothing? I am currently trying to pass a list of str (or int even) to an estimator and getting my list cut at the comma. Any developments on this issue?

clausagerskov avatar Aug 12 '22 12:08 clausagerskov

@yangaws

clausagerskov avatar Nov 03 '22 20:11 clausagerskov

As a workaround the environment variable SM_TRAINING_ENV has the hyperparameters in a properly formatted JSON string. This includes the proper JavaScript style bools (false/true) and null for None.

This works much nicer than trying to parse the hyperparams from strings:

hyperparameters = json.loads(os.environ["SM_TRAINING_ENV"])["hyperparameters"]

tarik-affinda avatar Jan 04 '23 01:01 tarik-affinda

Would be great if this could get sorted or if the docs were more expressive

alessandro-mrd avatar Jan 25 '23 17:01 alessandro-mrd

bump

clausagerskov avatar Feb 21 '23 17:02 clausagerskov

any updates? parsing values can be a pain if it's all nested and has different data types. This adds a lot of extra workload. If user simply passes in a python dict, they would expect to get the same thru a simple load using the json module.

zhenliu2012 avatar Mar 19 '23 18:03 zhenliu2012

I found a workaround when set_hyperparameters() adds escaped quotes to string (seems like some people in this thread have this issue).

Instead of using estimator.set_hyperparameters(my_param="some_string"), you can directly modify the hyperparameter using: estimator._hyperparameters.update(dict(my_param="some_string"))`

Hope that helps!

nbeuchat avatar May 12 '23 11:05 nbeuchat

Parsing this was particularly annoying when passing a nested dict as a hyperparameter. One workaround I found that bypasses any manual parsing was to base64 encode the dict as a json string, and pass that to my Estimator and then decode in my training script:

def b64encoded_str(j):
    return base64.b64encode(json.dumps(j).encode('utf-8')).decode('utf-8')

config = {...}  #Nest dict of lists, float, and dict
hyperparameters = {'b64config': b64encoded_str(config)}
estimator = PyTorch(
    image_uri=image_uri
    ...,
    hyperparamters=hyperparameters
)
estimator.fit(inputs=inputs)

and then in my training script:

def train(config):
     ....

def b64decode_str(s):
    return json.loads(base64.b64decode(s).decode('utf-8'))

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--b64config', type=str, required=True)
    args = parser.parse_args()

    cfg = b64decode_str(args.b64config)
    train(cfg)

Unideal that the hyperparameters are not legible on the Sagemaker console, but makes up for it in that it doesn't require any parsing!

kevinpmcgee14 avatar Aug 10 '23 19:08 kevinpmcgee14

Have been 3 years. Since this came up. Nothing?

RaydelMiranda avatar Jul 04 '24 01:07 RaydelMiranda