MUSE_OS
MUSE_OS copied to clipboard
Scripts to generate tutorial models
Description
The documentation contains a number of tutorials for customising models (adding new technologies, regions, agents etc.). Each tutorial documents the processes that the user has to carry out to achieve the desired modification, and shows results for a model that has previously been build. I had a go at following along with the tutorials, starting from the default model and making the required changes step-by-step, and found that my results never matched up with the figures in the notebooks. I think the problem is two-fold:
- The models contained in the repo were originally set up starting from an old version of the default model (I think).
- Several additional modifications have been made to the models that are not documented in the tutorials.
To fix this, we need to:
- Re-build the example models in the repo, starting with the current version of the default model
- Make sure that all steps used to set up the models are documented in the tutorial notebooks
This isn't so easy as the current way of generating models is to copy the default model and then manually edit a load of csv files, which I really don't want to do (this would also be a problem if the default model changes again). I think the best solution would be, rather than hardcoding the models, generate the models programatically using a generate_model.py file which would be run every time the documentation is build. (Also with regressions tests to check whether the outputs of the notebooks have changed)
Initial plan (April 23rd)
Towards this goal, I've started by documenting the processes that would be required to generate the models with pseudocode (still a work in progress). I don't intend to turn this into real code just yet, this is just to demonstrate the processes that have to be carried out to customise models. I've created 'functions' for:
- Adding a new commodity
- Adding a new process
- Adding price data for a new year
- Adding an agent
- Adding a region
- Adding a timeslice
All of these could be a good starting point for creating some kind of 'wizard' to carry out these functions for the user. The idea is that adding a new feature (agent, region etc.) could consist of copying an existing feature (which could be automated), followed by a series of manual modifications to the csv files (the steps beginning with >>>). We could also set up the wizard so it can create a 'blank' feature (with everything set to zero), which the user would then manually modify.
That said, I also think that current csv structure of the models could be significantly improved (and possibly even replaced by a relational database), in which case all of this would change, so I don't want to go too far down this route just yet without further conversation.
Update (May 17th)
I've gone through and written the necessary scripts for all the tutorials. Within the tutorial-code folder you'll find two new files: generate_models.py and run_models.py, both of which do what they say on the tin. The first one runs in a loop to generate the model input files, the second one runs models in parallel to generate the results files. I'm currently committing all model input files and results files to the repo (as was the case before), rather than running the scripts to generate the files during the documentation build. This is for two reasons:
- Many of the tutorial notebooks provide links to the input files on github. This is actually quite useful as it allows users to see exactly what input files were used to generate the results in the notebook
- The results files are used for regression tests, which requires them to be in the repo
I wonder if there's a better way of doing these things that doesn't require all these files to be committed?
There is still a lot of work to be done on the contents of the notebook, as there are still many inconsistencies in the text. This PR is more about having a framework in place to generate the model input files programmatically, and I think I'll tackle the contents of the notebooks separately
Original documentation New documentation
Fixes #99 Fixes #291
Type of change
Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s.
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Optimization (non-breaking, back-end change that speeds up the code)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] Breaking change (whatever its nature)
Key checklist
- [ ] All tests pass:
$ python -m pytest - [ ] The documentation builds and looks OK:
$ python -m sphinx -b html docs docs/build
Further checks
- [ ] Code is commented, particularly in hard-to-understand areas
- [ ] Tests added that prove fix is effective or that feature works
I'd suggest you code the first tutorial only, for now, to see what things would look like in practice and then we can decide on the best way forward.
I love it too! I think a wizard along the lines you're suggesting is eminently sensible. As we talked about in the meeting with Adam, I think it would be nice to have a tool for setting input parameters in general, even when not copying from an existing template -- as someone who understands MUSE fairly poorly atm I'm not sure if it makes sense for this to be a separate task?
Another thing is: do we want the option of having a graphical interface for this? Or just a terminal one? A graphical interface may be more user-friendly (particularly for non-technical MUSE users), but it isn't necessarily something to do now.
I guess our options are:
- Make a terminal-only interface now
- Make a graphical-only interface now
- Make a terminal-only interface, but make the code flexible enough that we could stick a (simple) GUI on top of it later
I'm leaning towards 3.
I've had a go at this for the first tutorial. See the folder 1-add-new-technology. The generate_models.py file is used to generate the relevant input files for the models described in the notebook, starting from the default model. My aim was to match the models as closely as possible to how they were before, which took a bit of detective work as many of the necessary steps aren't documented in the notebook. There are a few differences remaining as you can see from the diffs, but I think these are all insignificant, and the new version of the notebook looks identical to how it was before. I'm leaving the new input and results files here for now so we can look at the diffs, but I think eventually we'll want to remove them from the repo and generate them automatically when the documentation is build.
I've marked all the steps in generate_models.py that are undocumented in the notebook. We'll need to make sure these all get added as steps in the notebook so that users can fully reproduce the models (or remove any steps from the script that aren't necessary).
I've created a file wizard.py which contains the functions used to manipulate the input files. It's pretty minimal at the moment and only designed with this particular use case in mind. It's possible that this could be turned into a tool to help users build their models, however with all the possible edge cases I think making something robust enough to be useful would take a lot of work, so we'd need to decide whether that's worth it (especially since we may be changing the structure of the input files soon).
For now though, do you think it's worth going ahead with this for the other tutorials?
- If another tutorial builds on the model generated by, let's say, scenario 1, would they need a generate file that builds everything from scratch or can they actually thake the files for scenario 1 as the starting point.
I think this should be possible, but we would just have to make sure the scripts are run in the correct order in a loop (and not in parallel). Should be straightforward, although I haven't looked into this yet.
- When running the generate tutorial script, I get something slightly different to what was already in the repo. See attached. It's merely a formatting thing - it does not affect functionality - but will freak out the version control system because that file will look as modified. It happens whenever there's a list. I guess that when the
tomlfile is saved, it follows some convention which is different to the convention it originally had. Or it might be just my computer...
Have you tried running pre-commit? I think this should get rid of the diffs, but I guess it's still quite annoying. Long run though we'll probably add these files to gitignore so it shouldn't be a problem
Not run pre-commit as I had nothing to commit. But you're right - if these files are generated on the fly, there's no reason to keep them in the repo, so we could just gitignore-them.
I've gone through and written the necessary scripts for all the tutorials. Within the tutorial-code folder you'll find two new files: generate_models.py and run_models.py, both of which do what they say on the tin. The first one runs in a loop to generate the model input files, the second one runs models in parallel to generate the results files. I'm currently committing all model input files and results files to the repo (as was the case before), rather than running the scripts to generate the files during the documentation build. This is for two reasons:
- Many of the tutorial notebooks provide links to the input files on github. This is actually quite useful as it allows users to see exactly what input files were used to generate the results in the notebook
- The results files are used for regression tests, which requires them to be in the repo
I wonder if there's a better way of doing these things that doesn't require all these files to be committed?
There is still a lot of work to be done on the contents of the notebook, as there are still many inconsistencies in the text. This PR is more about having a framework in place to generate the model input files programmatically, and I think I'll tackle the contents of the notebooks separately
I've made some comments about lists and generators, but it's more of a note for future. Only change it if you can be bothered.
Thanks @alexdewar - useful advice!