eskom-calendar
eskom-calendar copied to clipboard
Keep track of historical load shedding
Keeping track of historical loadshedding is technically feasible at the moment, but it isn't easy to accomplish.
Basically all the information is in the git log for the file manually_specified.yaml
, but extracting and compiling it would be a pain.
Probably the easiest way to make historical loadshedding data available would be to have a CI/CD script that runs every time the calendars get built. This script should calculate the historical loadshedding (either by updating the previously calculated data or by recalculating everything from scratch) and emit a file containing that information.
For parsing, it would be easiest if that file were formatted in the same way as manually_specified.yaml
:
changes:
- stage: 4
start: 2023-08-31T14:00:00
finsh: 2023-09-02T05:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
- stage: 2
start: 2023-09-02T05:00:00
finsh: 2023-09-02T16:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
...
Keeping the format the same would mean the main codebase is equally able to calculate historical loadshedding and future loadshedding. However, it shouldn't be too much work to parse some different format, if that format provided some benefits.
Note that YAML is a superset of JSON, so the below snippet is valid YAML, while requiring fewer characters:
changes:
- { stage: 4, start: 2023-08-31T14:00:00, finsh: 2023-09-02T05:00:00, source: https://twitter.com/Eskom_SA/status/1697210092179935262, exclude: coct }
- { stage: 2, start: 2023-09-02T05:00:00, finsh: 2023-09-02T16:00:00, source: https://twitter.com/Eskom_SA/status/1697210092179935262, exclude: coct }
Keeping the format the same is not a hard requirement, but alternatives should be properly motivated.
Here's a high level checklist:
- [ ] Write a script (python/rust) that can run locally and create one file containing the entire history of loadshedding.
- [ ] Add tests for the above script.
- [ ] It should fail gracefully and handle edge cases/date boundaries properly.
- [ ] Ensure that it handles the fact that Cape Town and Eskom often have different schedules.
- [ ] If your script requires network access, ensure it fails gracefully if it doesn't have that
- [ ] Often the VMs that run the GH actions only download a portion of the repo (since full git history isn't really required). Ensure your script handles this properly by either downloading the full history or otherwise making a plan.
- [ ] Ask @beyarkay for access to the
eskom-calendar-dev
repo. This is a private mirror ofeskom-calendar
, used to test CI/CD things. You can also set up your own, but getting the private GitHub keys setup (which allow GH actions to run faster) can be a pain. - [ ] Integrate your script into the publish-calendars workflow. You'll probably want to add a step after the 'upload to pastebin' step (here). You'll also need to make sure your script writes to the
calendars/
directory, as that's the only directory that gets uploaded to GitHub releases. - [ ] Manually test out the script a few times, updating
manually_specified.yaml
and asserting that the updated changes get properly integrated.
If the above are done, then all should be good! @beyarkay will check things over and merge.
What do you think of the following for the Historical Data Format (following a similar pattern to manually_specified.yml
):
historical_changes:
- stage: 4
start: 2023-08-31T14:00:00
finish: 2023-09-02T05:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
historical: false
- stage: 2
start: 2023-09-02T05:00:00
finish: 2023-09-02T16:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
historical: true
where historical: It is a boolean field which will be true
for entries that are coming from historical data and false
for new entries.
I'm actually wondering if there are any disadvantages to keeping the format identical, so future changes look the same as historical changes:
historical_changes:
- stage: 4
start: 2023-08-31T14:00:00
finish: 2023-09-02T05:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
- stage: 2
start: 2023-09-02T05:00:00
finish: 2023-09-02T16:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
Very happy to hear feedback/opinions on this, but my reasoning is:
- Keeping the format the same will mean no change is required to parse historical loadshedding. For example, figuring out the loadshedding schedule for
western-cape-stellenbosch
for the upcoming week would be identical to figuring out the loadshedding schedule forwestern-cape-stellenbosch
for the past week. - From a technical perspective, no information is lost (I think?) since historical changes will always be in the past and future changes will always be in the future. It's possible there's an edge case here that makes this not true, but I can't think of one.
Here's a link to the struct
that defines the loadshedding change. Removing the rust-specific details, it looks like:
struct Change {
start: String,
finsh: String,
stage: unsigned 8-bit integer,
source: String,
include_regex: Option<String>,
exclude_regex: Option<String>,
include: Option<String>,
exclude: Option<String>,
}
include
and exclude
are really just syntactic sugar that get converted into explicit regexs which an area name must match if it is affected by the relevant Change
. include
and exclude
get converted to include_regex
and exclude_regex
by this function which basically just converts shorthand like cape-town
into regex like city-of-cape-town-area-\d{1,2}
. The regex matching hasn't been as useful as I thought it would be (and I don't think I've ever actually used it in manually_specified.yaml
) so I don't think it's worth your time trying to deal with it. Just assume include_regex
and exclude_regex
don't exist.
Agreed. I don't see a disadvantage in keeping the format the same.
To explain my thought process for dealing with two files that have overlapping times or conflicting stages like this:
- stage: 3
start: 2023-09-04T10:00:00
finsh: 2023-09-04T22:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
- stage: 5
start: 2023-09-04T22:00:00
finsh: 2023-09-05T05:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
and a newer file:
- stage: 5
start: 2023-09-04T18:00:00
finsh: 2023-09-04T22:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
- stage: 6
start: 2023-09-04T22:00:00
finsh: 2023-09-05T05:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
where the differences are stage 3 in the first entry moving to stage 5 for a portion of the overlapping time and a change from stage 5 to 6 for a whole time frame.
Then the output in the historical data would be
- stage: 6
start: 2023-09-04T22:00:00
finsh: 2023-09-05T05:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
- stage: 5
start: 2023-09-04T18:00:00
finsh: 2023-09-04T22:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
- stage: 3
start: 2023-09-04T10:00:00
finsh: 2023-09-04T18:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
where the rules for these changes can be summarised as "new entries replace older entries for overlapping times". Essentially, we delete the incorrect entry from the file completely...
Yes that looks correct to me. Although attempting to read these is making me remember why I tried to make a schedule visualiser a while back (it's trickier than it might seem at first glance). If I get a chance later on today, I'll write up some test cases (probably formatted as a multi-document yaml file) so that we can get the computer verifying these things for us.
I'll write out some high-level test examples below:
It'll be useful to have a little custom syntax: file1
is older than file2
, the caret ^
indicates the current time,
and a series of numbers like _ _ 4 4 0 2 2 2
indicates several stages over
some unit of time:
-
_ _
: 2 units where we don't know what loadshedding stage it is, -
4 4
: 2 units of stage four, -
0
: followed by no loadshedding for one unit of time -
2 2 2
: followed by stage two for 3 units of time
With the above, we can define some mini-test examples like:
If there are conflicts in the future, the newer file should take precedence:
file1: 2 2 2 2
file2: 4 4 2 2
now: ^
result: 4 4 2 2
If there are conflicts in the past, the newer file should still take precedence (sometimes loadshedding will be bumped to stage 6 at 2am, but the announcement will only be made public at 7am, so we want to catch this edge case):
file1: 2 2 2 2
file2: 4 4 2 2
now: ^
result: 4 4 2 2
If the start
/finsh
boundaries don't align nicely across different
files, then the result should properly figure out the new boundaries
file1: 2 2 2 2 3 3 3 3
file2: _ _ 4 4 4 4 2 2
now: ^
result: 2 2 4 4 4 4 2 2
If an old file says "stage 6 for the rest of time" but a newer file updates to say "stage 3 for the next week" then there should only be stage 3 for the next week (it should not be followed by stage 6 for the rest of time)
file1: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
file2: _ _ _ _ 3 3 3 3 _ _ _ _ _ _ _
now: ^
result: 6 6 6 6 3 3 3 3 _ _ _ _ _ _ _
The above example also shows that there must be an option to specify "unknown loadshedding". Unfortunately this does happen sometimes and it's unavoidable.
Finally, here's one big example, just to stress test things a bit
file1: 1 2 2 2 2 _ _ _ _ _ _ _ _ _ _ _ _ _
file2: _ _ 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
file3: _ _ _ _ _ 3 2 3 2 _ _ _ _ _ _ _ _ _
file4: _ _ _ _ _ _ _ _ _ _ _ 1 1 1 1 1 1 1
file5: _ _ _ _ _ _ _ _ _ _ _ _ _ 0 0 1 1 _
file6: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2 2
now: ^
result: 1 2 6 6 6 3 2 3 2 _ _ 1 1 0 0 1 2 2
I'll try write these up as YAML files tonight, but this should give you a good idea. Please do bug me if it looks like I haven't been consistent with the rules.
I will have a look at this and get back to you but an issue (maybe something I missed?) is the different formatting of the manually_specified.yaml file. I can get you the commit hash if necessary. Easy enough to skip the files that are misbehaving and mark a period between two successful reads as unknown.
For example:
# How to edit this file:
# You should add items to `changes`. For example, here's a template that you
# can copy and paste just below the line `changes:`:
# ```
# - stage: <STAGE NUMBER HERE>
# start: <START TIME HERE>
# finsh: <FINISH TIME HERE>
# source: <URL TO INFORMATION SOURCE HERE>
# exclude: <coct if this schedule doesn't apply to cape town>
# include: <coct if this schedule only applies to cape town>
# ```
# See the README.md for more details
---
changes:
start: 2023-08-31T14:00:00
finsh: 2023-09-02T05:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
- stage: 2
start: 2023-09-02T05:00:00
finsh: 2023-09-02T16:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
- stage: 4
start: 2023-08-31T10:00:00
finsh: 2023-08-31T17:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
- stage: 2
start: 2023-08-31T17:00:00
finsh: 2023-08-31T22:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
- stage: 4
start: 2023-08-31T22:00:00
finsh: 2023-09-01T05:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
- stage: 2
start: 2023-09-01T05:00:00
finsh: 2023-09-01T22:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
- stage: 4
start: 2023-09-01T22:00:00
finsh: 2023-09-03T05:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
- stage: 2
start: 2023-09-03T05:00:00
finsh: 2023-09-03T17:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
historical_changes: []
as compared to:
# How to edit this file:
# You should add items to `changes`. For example, here's a template that you
# can copy and paste just below the line `changes:`:
# ```
# - stage: <STAGE NUMBER HERE>
# start: <START TIME HERE>
# finsh: <FINISH TIME HERE>
# source: <URL TO INFORMATION SOURCE HERE>
# exclude: <coct if this schedule doesn't apply to cape town>
# include: <coct if this schedule only applies to cape town>
# ```
# See the README.md for more details
---
- stage: 3
start: 2023-08-27T16:00:00
finsh: 2023-08-28T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-08-28T05:00:00
finsh: 2023-08-28T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-08-28T16:00:00
finsh: 2023-08-29T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-08-29T05:00:00
finsh: 2023-08-29T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-08-29T16:00:00
finsh: 2023-08-30T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-08-30T05:00:00
finsh: 2023-08-30T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-08-30T16:00:00
finsh: 2023-08-31T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-08-31T05:00:00
finsh: 2023-08-31T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-08-31T16:00:00
finsh: 2023-09-01T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-09-01T05:00:00
finsh: 2023-09-01T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-09-01T16:00:00
finsh: 2023-09-02T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-09-02T05:00:00
finsh: 2023-09-02T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-08-27T16:00:00
finsh: 2023-08-28T00:00:00
source: https://twitter.com/CityofCT/status/1695804610932273188
include: coct
historical_changes: []
@beyarkay have you had any time to generate some test yaml files? (I am also working on some).
I have started some rudimentary test cases for a data aggregation file I wrote here. I have generated the historical data using this file but I am afraid it will (may) be riddled with errors until some proper testing is done.
Hey, sorry for the delay.
Yes you're correct, the misbehaving file should be omitted (the one formatted like:
---
- stage: 3
start: 2023-08-27T16:00:00
finsh: 2023-08-28T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-08-28T05:00:00
...
I'm not sure what happened there). The correctly formatted file should have two keys: changes
and historical_changes
, each of which accepts a list of "change" objects (although historical_changes
is deprecated and not used anymore).
Busy working on the test files now, should have them uploaded to your PR in a bit.