solutions
solutions copied to clipboard
Initial script for issue #23 health & education electricity cluster
Not intending to merge yet, just making my initial attempt visible to field Denton's feedback on the file structure & approach
Questions on how to structure this solution:
- Should "cluster" sheet have their own subdirectory within healthandeducation? (ex: healthandeducation/electricity_cluster/init.py)
- Unit tests for each cluster can exist within each subdirectory (ex: healthandeducation/electricity_cluster/test_electricity.py)
- Should each class still be called Scenario()? Or should they be named after their cluster?
- Can the population tables or emissions factors be reused from elsewhere in the repo?
Please excuse poor code hygiene & long lines. I plan to format this before the final commit. Which autoformatter do you recommend, if any?
I used many country/region references in the electricity_cluster implementation because I wanted it to be as explicit and similar to the Excel as possible for easy debugging. Shall I plan on changing the column naming convention or using something like .iloc[] which doesn't rely on strings of column names?
I've added an initial stab at a unit testing file. I went the route of adding a huge CSV file with all of the intermediate tables from Chad's Excel file to use as the "expected df" (can turn this into a zip file later). Only added a single unit test so far. Let me know if this setup will work and I'll flesh out the rest of the tests.
I already see that the test_electricity.py file has failed the build. This is because I had to resort to a sys.path.append() to get my electricity_cluster module to import, since I was unable to get it to import naturally like the rest of the modules in this repo. Could you help me work around this?
Shall I plan on changing the column naming convention or using something like .iloc[] which doesn't rely on strings of column names?
iloc tends to be fragile, especially when using it to address rows. For example, if two researchers both add a line at the end of the file and one of them gets a merge conflict, the resolution will move the new row and any iloc[] trying to reference it will be off-by-one.
We've tried to mostly only use iloc[0], meaning "the first data point no matter what the year is."
This is because I had to resort to a sys.path.append() to get my electricity_cluster module to import, since I was unable to get it to import naturally like the rest of the modules in this repo. Could you help me work around this?
This may end up being a reason to add it as electricity_cluster/__init__.py instead of electricity_cluster.py, so it will import as a regular module.
Oops. Didn't intend to close it, reopened.
This may end up being a reason to add it as electricity_cluster/init.py instead of electricity_cluster.py, so it will import as a regular module.
I made the requested change in this commit. However, I opted to name the directory "clusters" rather than "electricity_cluster" so it can house .py files from all the clusters and share the same init. Let me know if this structure is suitable.
However, I am still unable to import upper level modules like model.dd without having a sys.path.append, as you'll see in this commit. Is this okay to keep or is there some other convention we prefer to follow?
iloc tends to be fragile, especially when using it to address rows. For example, if two researchers both add a line at the end of the file and one of them gets a merge conflict, the resolution will move the new row and any iloc[] trying to reference it will be off-by-one.
Good to know. In that case I will keep the verbose strings for column indexing. Point noted about using iloc in case I need to access the first row regardless of year.
@Sunishchal is this PR still valid, or is it obsolete? I see some tests failed, is that something we should try to address?
@Sunishchal is this PR still valid, or is it obsolete? I see some tests failed, is that something we should try to address?
@brodavi It's still valid but not ready to merge for a while until we figure out how to integrate this model with the rest of the codebase. I just had a chat with @denised and it sounds like we'll need to spend some time understanding how to do that since this model is quite different from other solutions.