if
if copied to clipboard
Add `--append` feature to IF
What Sub of #764
Add an --append mode to IF that takes a manifest with outputs and, instead of overwriting the outputs, adds new timesteps to them.
Why Enables IF to be run continuously or as batch jobs and still yield a single output manifest.
Context
We want people to be able to have intermittent IF runs that append output data to a file rather than each independent run overwriting the outputs section.
The way this would work is if that an importer in the observe pipeline is configured to grab data using a relative time definition such as latest or daily, meaning the timestamps are not hardcoded into the manifest, but are inferred from the time of execution. In this case, the same manifest would return data for a different time range each time it is executed, and each new set of data would overwrite what was there before. What we would like to do instead, is to add the --append tag to the CLI to configure IF to add new inputs and outputs to the manifest instead of overwriting them.
--append can have some firm boundaries at this stage to make the feature simpler to build, for example:
- the first timestep in new batches of
inputsmust be later than the last timestep in the previous batch (i.e. you cannot append overlapping time series) - do not check that the pipelines are consistent between runs (
if-checkwould be able to pick up discrepancies post-hoc anyway as the given inputs would not lead to the given outputs for earlier batches if the pipeline was updated)
Prerequisites/resources none
SoW (scope of work)
- [ ]
--appendis added to IF - [ ] documentation updated
- [ ] test cases added
Acceptance criteria
-
[ ]
--appendadds new data to an existing output fileGIVEN I have the following manifest
name: mock-append
description:
initialize:
plugins:
mock-observations:
path: builtin
method: MockObservations
global-config:
timestamp-from: '2024-03-05T00:00:00.000Z'
timestamp-to: '2024-03-05T00:00:03.000Z'
duration: 1
components:
- name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
generators:
common:
cloud/vendor: azure
randint:
cpu/energy:
min: 1
max: 99
mem/energy:
min: 1
max: 99
sum:
path: builtin
method: Sum
global-config:
input-parameters:
- cpu/energy
- mem/energy
output-parameter: energy
execution:
command: >-
/home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
/home/user/Code/if/src/index.ts -m
manifests/examples/mock-cpu-util-to-carbon.yml -s
environment:
if-version: 0.4.0
os: linux
os-version: 5.15.0-107-generic
node-version: 21.4.0
date-time: 2024-06-18T14:18:44.864Z (UTC)
dependencies:
- '@babel/[email protected]'
- '@babel/[email protected]'
- '@commitlint/[email protected]'
- '@commitlint/[email protected]'
- '@grnsft/[email protected]'
- '@jest/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
status: success
tree:
pipeline:
- mock-observations
- sum
defaults: null
config:
group-by:
group:
- cloud/region
- name
inputs:
- timestamp: '2024-03-05T00:00:00.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 5
mem/energy: 10
- timestamp: '2024-03-05T00:00:01.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 71
mem/energy: 5
- timestamp: '2024-03-05T00:00:02.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
outputs:
- timestamp: '2024-03-05T00:00:00.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 5
mem/energy: 10
energy: 15
- timestamp: '2024-03-05T00:00:01.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 71
mem/energy: 5
energy: 76
- timestamp: '2024-03-05T00:00:02.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
energy: 110
AND now I open this file and update the timestamps int he mock observation config so they are more recent, without removing any of the inputs or outputs
name: mock-cpu-util-to-carbon
description: >-
a complete pipeline that starts with mocked CPU utilization data and outputs
operational carbon in gCO2eq
initialize:
plugins:
mock-observations:
path: builtin
method: MockObservations
global-config:
timestamp-from: '2024-03-05T00:00:04.000Z'
timestamp-to: '2024-03-05T00:00:07.000Z'
duration: 1
components:
- name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
generators:
common:
cloud/vendor: azure
randint:
cpu/energy:
min: 1
max: 99
mem/energy:
min: 1
max: 99
sum:
path: builtin
method: Sum
global-config:
input-parameters:
- cpu/energy
- mem/energy
output-parameter: energy
execution:
command: >-
/home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
/home/user/Code/if/src/index.ts -m
manifests/examples/mock-cpu-util-to-carbon.yml -s
environment:
if-version: 0.4.0
os: linux
os-version: 5.15.0-107-generic
node-version: 21.4.0
date-time: 2024-06-18T14:18:44.864Z (UTC)
dependencies:
- '@babel/[email protected]'
- '@babel/[email protected]'
- '@commitlint/[email protected]'
- '@commitlint/[email protected]'
- '@grnsft/[email protected]'
- '@jest/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
status: success
tree:
pipeline:
- mock-observations
- sum
defaults: null
config:
group-by:
group:
- cloud/region
- name
inputs:
- timestamp: '2024-03-05T00:00:00.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 5
mem/energy: 10
- timestamp: '2024-03-05T00:00:01.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 71
mem/energy: 5
- timestamp: '2024-03-05T00:00:02.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
outputs:
- timestamp: '2024-03-05T00:00:00.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 5
mem/energy: 10
energy: 15
- timestamp: '2024-03-05T00:00:01.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 71
mem/energy: 5
energy: 76
- timestamp: '2024-03-05T00:00:02.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
energy: 110
WHEN I run the manifest with if-run -m manifest.yml --append
THEN if I open manifest.yml it contains the following:
name: mock-cpu-util-to-carbon
description: >-
a complete pipeline that starts with mocked CPU utilization data and outputs
operational carbon in gCO2eq
initialize:
plugins:
mock-observations:
path: builtin
method: MockObservations
global-config:
timestamp-from: '2024-03-05T00:00:04.000Z'
timestamp-to: '2024-03-05T00:00:07.000Z'
duration: 1
components:
- name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
generators:
common:
cloud/vendor: azure
randint:
cpu/energy:
min: 1
max: 99
mem/energy:
min: 1
max: 99
sum:
path: builtin
method: Sum
global-config:
input-parameters:
- cpu/energy
- mem/energy
output-parameter: energy
execution:
command: >-
/home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
/home/user/Code/if/src/index.ts -m
manifests/examples/mock-cpu-util-to-carbon.yml -s
environment:
if-version: 0.4.0
os: linux
os-version: 5.15.0-107-generic
node-version: 21.4.0
date-time: 2024-06-18T14:18:44.864Z (UTC)
dependencies:
- '@babel/[email protected]'
- '@babel/[email protected]'
- '@commitlint/[email protected]'
- '@commitlint/[email protected]'
- '@grnsft/[email protected]'
- '@jest/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
status: success
tree:
pipeline:
- mock-observations
- sum
defaults: null
config:
group-by:
group:
- cloud/region
- name
inputs:
- timestamp: '2024-03-05T00:00:00.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 5
mem/energy: 10
- timestamp: '2024-03-05T00:00:01.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 71
mem/energy: 5
- timestamp: '2024-03-05T00:00:02.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
- timestamp: '2024-03-05T00:00:03.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
- timestamp: '2024-03-05T00:00:04.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
- timestamp: '2024-03-05T00:00:05.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
- timestamp: '2024-03-05T00:00:06.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
outputs:
- timestamp: '2024-03-05T00:00:00.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 5
mem/energy: 10
energy: 15
- timestamp: '2024-03-05T00:00:01.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 71
mem/energy: 5
energy: 76
- timestamp: '2024-03-05T00:00:02.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
energy: 110
- timestamp: '2024-03-05T00:00:03.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
energy: 110
- timestamp: '2024-03-05T00:00:04.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
energy: 110
- timestamp: '2024-03-05T00:00:05.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
energy: 110
- timestamp: '2024-03-05T00:00:06.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
energy: 110
@jawache please review the AC
@jawache @zanete Iβd be happy to take this one if useful, let me know
@jamescrowley that's great to hear, let me tag @jmcook1186 so he is aware and can comment if there's anything standing in the way π
Hi @jamescrowley - yes, please go for it - thanks!
Hi @jamescrowley, I hope youβre doing great! I just wanted to check in and see how you're doing with this feature. Please feel free to share any updates or questions you have for us to discuss! π
@zanete I'm starting with some integration tests set up to capture the requirement defined above. However, several of the current integration tests do not pass locally. For example
Executing `aggregate.yaml`
β Files do not match!
tree.children.application.children.uk-west.children.server-1.aggregated.cpu/utilization
source: 148
target: 74
Executing `mock-obs-time-sync.yaml`
β Files do not match!
tree.children.child-1.outputs.0.cloud/instance-type
source: NaN
target: A1
Executing `success.yaml`
β Files do not match!
tree.children.child.outputs.0.duration
source: exists
target: missing
Executing `failure-not-matching-with-regex.yaml`
β Files do not match!
execution.status
source: success
target: fail
Executing `success.yml.yaml`
β [2024-07-25 02:35:13.797 PM] error: ENOENT: no such file or directory, open 'if/manifests/outputs/plugins/sci/re-success.yml.yaml'
Error: ENOENT: no such file or directory, open 'if/manifests/outputs/plugins/sci/re-success.yml.yaml'
---------
Check summary:
52 of 61 files are passed.
Are these expected? I noticed in the CI set up they don't run on every PR but only for a release?
@jamescrowley thanks so much for the update! Let me tag @narekhovhannisyan who can hopefully shed some light on your question. π
@zanete @jmcook1186 Two questions:
-
In the case of an aggregation and group bys, would you expect these to be applied over the combined (pre-existing and new) outputs, or only on the new outputs? I'm assumingthe former, but let me know your thoughts.
-
The examples given in the issue has
timestamp-from: '2024-03-05T00:00:04.000Z'
for the 're run'. I assume that was a typo, and timestamp-from: should be '2024-03-05T00:00:03.000Z' in order to get the output described in the example, but let me know if I've missed a nuance somewhere.
I've pushed a rough draft here: https://github.com/Green-Software-Foundation/if/pull/932 for discussion to ensure it's along the lines of what you had in mind?
Thanks so much @jamescrowley , let's get @jmcook1186 and @narekhovhannisyan to take a look, please π