Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

reward/instructor/trainer.py needs to handle default values of config vars in training_conf (loaded from .yml file)

Open henryj18 opened this issue 2 years ago • 4 comments

I have tried to run the training "python trainer.py configs/electra-base-dis-webgpt.yml" and ran into an issue of "Keyerror: scheduler" at line 140 lr_scheduler_type=training_conf["scheduler"] Further inspection revealed that it is caused by missing "scheduler" from "configs/electra-base-dis-webgpt.yml". As a result, the key to index the training_conf data structure does not exist (the "scheduler" in this case). The current version of trainer.py simply throws an error in that case. I suggest that we modify the behavior of trainer.py so that it uses a default value for a certain config var if it is not defined in the .yml file. This should apply to all such config vars, and "scheduler" happens to be the one of them I just spotted

henryj18 avatar Jan 30 '23 02:01 henryj18

More thoughts: if trainer.py will not be changed to handle the default values of config vars, then there are other options, such as filling every .yml file with a complete set of all config var/value pairs, so that the trigger for trainer.py to raise such kind of errors is removed. In that case, we should add a README.md file under config folder to document that

Need to discuss the pros/cons of each option and then decide

These config vars are the hyperparameters that impact the outcome of training, we can also try our best to document the impact from different value settings

henryj18 avatar Jan 30 '23 17:01 henryj18

I can take a shot on this issue, and I would like to be part of the solution team of this issue

henryj18 avatar Jan 30 '23 18:01 henryj18

I think these scripts are just being used for internal development at the moment so it's not a big problem if config changes are required for them to run. That said, if there is an appropriate default value for the options I don't see a problem with adding code to use that default value if none is specified. Feel free to take a look at adding this

olliestanley avatar Feb 05 '23 15:02 olliestanley

Perhaps just populate the .yml file with the default values of all the possible parameters, and add some statements in the README.md about the fact that these are the default values. Also mention that you can modify the values to see how the change can impact the outcome of training, but warn against deleting any parameter/value pair from the yml file since that may break the training.

That way, we do not need to change the .py code for that purpose.

This approach seems to be cleaner than changing the .py code to handle default values

henryj18 avatar Feb 05 '23 23:02 henryj18

Nice job and kudos to your guys, appreciate your hard work!

On retrospect, could we streamline the process by not opening a new issue, but instead placing the people working on 1438 into the team to fix 1008? That can help reduce the complexity of managing multiple github issues on the exact same issue, and help to save repeated work. I was assigned 1008 earlier, and got notified that 1438 fixed 1008 only when 1438 was done and merged. As a result, the work I have done on 1008 and was just about to push into github become repeated. It would be great if we can put into place some process to avoid such repeated work in the future.

henryj18 avatar Feb 12 '23 14:02 henryj18