if
if copied to clipboard
feat(src): add support for appending to existing outputs
Types of changes
- [x] Enhancement (project structure, spelling, grammar, formatting)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] My code follows the code style of this project.
- [x] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [x] I have added tests to cover my changes.
- [x] All new and existing tests passed.
A description of the changes proposed in the Pull Request
Closes #845.
A few areas to discuss/agree:
- Behaviour for group by / aggregation - I may not fully understand the implications for these types of transformations yet.
- Does the way the argument is passed around make sense
- Is it ok this living as part of the core compute logic, as opposed to one of the builtins or other extension points?
Hi @jamescrowley - just back from vacation today - looking forward to checking this out tomorrow. Thanks for your efforts!
Hi @jamescrowley great point about aggregation and time-sync. I think the only behaviour that makes sense is to re-run the aggregation / time-sync over the new data and overwrite whatever is already present.
The "right" way to use append would probably be for the user to toggle time-sync and aggregation on some final manifest file after all the append actions have been taken, but in cases where a user is appending to an already-synced-or-aggregated manifest, the time-sync and/or aggregation should be repeated with the new input data, with the results overriding the original.
re groupby - in the latest IF (main branch, will be in the next release today/tomorrow) it's actually called regroup and executed as part of the IF execution flow rather than as a plugin (it's done automatically if you use if-run with no other commands, but you can isolate just that function using if-run --regroup with the necessary config in the manifest). Each time regroup is executed is flattens all the input data and regroups again from that 1D array, so I think that should just work even with --append...
It's ok having this as an amendment to the core execution logic - I don't think it makes sense as a plugin/builtin or other extension.
also tagging in @narekhovhannisyan for visibility
👍 I’ll rebase and add some tests to try and capture that behaviour
@jmcook1186 I'm working through regroup as there's a few things to suss out there, but in the meantime, some questions re documenting the aggregation behaviour w/ tests:
- It appears the only place I can capture the combined compute & aggregate behaviour is in the integration tests, is that right?
appendis currently an argument to if-run, which means it is not loaded from the manifest, and thus I'm unable to write a manifest that tests this behaviour for the integration tests.
As I see it the options are: (a) don't add any tests to explicitly capture this (or express them only in terms of stand-alone compute or aggregate behaviour) (b) adding some tests as part of the jest test suite that hooks up the compute & aggregate steps (c) we figure out a way to pass the append argument through the manifest-based integration tests (d) something else?
Thanks!
Hi @jamescrowley - looping in @narekhovhannisyan who can give better answers to these questions!
@jmcook1186 @narekhovhannisyan also FYI on the regroup, I think the behaviour that makes most sense is to regroup outputs in addition to inputs when in append mode - otherwise we can't sensibly add the new outputs if the existing outputs aren't grouped. I'll push up some commits once I've ironed out the kinks but let me know if you object on principal and think I should go another way!
I don't object on principle - do you have an opinion @narekhovhannisyan?
I've pushed proposed changes.
- Regroup is now fed outputs alongside inputs and merges those, when the append flag is set. I did some refactoring prior to the changes to make those easier.
- I've kept the 'append' preservation of inputs independent within both the compute and regroup steps, as this felt clearer (here and here).
- I've not copied the outputs with a deep clone like inputs - this didn't seem to be necessary but let me know if I've missed something subtle.
- I'm not entirely happy with the tests - in that it's not always clear the specific behaviour we're actually looking for in the expected results (open to suggestions there), but for now just tried to make the test name descriptive.
- There are currently no integration tests covering this (see issues previously flagged).
Hey @jmcook1186, @jamescrowley. sorry for late reply. Will check the threaд and will get back with the results.
Hi @jamescrowley sorry it's taking a while to get this reviewed - we've got some somewhat urgent features to ship to support a demo in a couple of weeks, but I'm keen to get into this asap. I'm planning to check out your branch and start looking into it later today.
Cheers
Hi @jamescrowley - i checked out the branch yesterday and am having trouble making the append feature work - could you please provide an example manifest and run command combo that works for you so I can try to replicate locally?
@jmcook1186 I was just running the second example in the original ticket (with original inputs, and modified mock observations for a later time), with the compute/group updated for the latest syntax
name: append
description: >-
a complete pipeline that starts with mocked CPU utilization data and outputs
operational carbon in gCO2eq
initialize:
plugins:
mock-observations:
path: builtin
method: MockObservations
global-config:
timestamp-from: '2024-03-05T00:00:04.000Z'
timestamp-to: '2024-03-05T00:00:07.000Z'
duration: 1
components:
- name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
generators:
common:
cloud/vendor: azure
randint:
cpu/energy:
min: 1
max: 99
mem/energy:
min: 1
max: 99
sum:
path: builtin
method: Sum
global-config:
input-parameters:
- cpu/energy
- mem/energy
output-parameter: energy
execution:
command: >-
/home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
/home/user/Code/if/src/index.ts -m
manifests/examples/mock-cpu-util-to-carbon.yml -s
environment:
if-version: 0.4.0
os: linux
os-version: 5.15.0-107-generic
node-version: 21.4.0
date-time: 2024-06-18T14:18:44.864Z (UTC)
dependencies:
- '@babel/[email protected]'
- '@babel/[email protected]'
- '@commitlint/[email protected]'
- '@commitlint/[email protected]'
- '@grnsft/[email protected]'
- '@jest/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
status: success
tree:
pipeline:
compute:
- mock-observations
- sum
regroup:
- cloud/region
- name
defaults: null
inputs:
- timestamp: '2024-03-05T00:00:00.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 5
mem/energy: 10
- timestamp: '2024-03-05T00:00:01.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 71
mem/energy: 5
- timestamp: '2024-03-05T00:00:02.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
outputs:
- timestamp: '2024-03-05T00:00:00.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 5
mem/energy: 10
energy: 15
- timestamp: '2024-03-05T00:00:01.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 71
mem/energy: 5
energy: 76
- timestamp: '2024-03-05T00:00:02.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
energy: 110
and running
npm run if-run -- -m manifests/outputs/features/append.yaml -o manifests/outputs/features/re-append --append
you then get output:
name: append
description: >-
a complete pipeline that starts with mocked CPU utilization data and outputs
operational carbon in gCO2eq
initialize:
plugins:
mock-observations:
path: builtin
method: MockObservations
global-config:
timestamp-from: '2024-03-05T00:00:04.000Z'
timestamp-to: '2024-03-05T00:00:07.000Z'
duration: 1
components:
- name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
generators:
common:
cloud/vendor: azure
randint:
cpu/energy:
min: 1
max: 99
mem/energy:
min: 1
max: 99
sum:
path: builtin
method: Sum
global-config:
input-parameters:
- cpu/energy
- mem/energy
output-parameter: energy
execution:
command: >-
/Users/jcrowley/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
/Users/jcrowley/Development/gsf/if/src/if-run/index.ts -m
manifests/outputs/features/append.yaml -o
manifests/outputs/features/re-append --append
environment:
if-version: 0.6.0
os: macOS
os-version: 14.6.1
node-version: 20.16.0
date-time: 2024-09-04T01:05:58.758Z (UTC)
dependencies:
- '@babel/[email protected]'
- '@babel/[email protected]'
- '@commitlint/[email protected]'
- '@commitlint/[email protected]'
- '@grnsft/[email protected]'
- '@jest/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- '@types/[email protected]'
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
status: success
tree:
pipeline:
compute:
- mock-observations
- sum
regroup:
- cloud/region
- name
defaults: null
children:
westus3:
children:
server-1:
inputs:
- timestamp: '2024-03-05T00:00:00.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 5
mem/energy: 10
- timestamp: '2024-03-05T00:00:01.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 71
mem/energy: 5
- timestamp: '2024-03-05T00:00:02.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
outputs:
- timestamp: '2024-03-05T00:00:00.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 5
mem/energy: 10
energy: 15
- timestamp: '2024-03-05T00:00:01.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 71
mem/energy: 5
energy: 76
- timestamp: '2024-03-05T00:00:02.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 36
mem/energy: 74
energy: 110
- timestamp: '2024-03-05T00:00:04.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 2
mem/energy: 26
energy: 28
- timestamp: '2024-03-05T00:00:05.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 67
mem/energy: 27
energy: 94
- timestamp: '2024-03-05T00:00:06.000Z'
duration: 1
name: server-1
cloud/instance-type: Standard_E64_v3
cloud/region: westus3
cloud/vendor: azure
cpu/energy: 88
mem/energy: 6
energy: 94
which includes the additional outputs. Ideally would include this as an integration test if we can codify how we want to pass 'append' into the tests.
Let me know if you get the above, or if I've misunderstood somewhere along the way.
got it - thanks, yes I have consistent results to you now. Will dig in and try to understand your changes a bit more today - all seems to behave as expected so far. Then will pass to @manushak to approve.
Hi @jamescrowley, I've reviewed the PR, and it seems the append works only once. I've recorded a short video. In the video, it is shown that the last output data from 04-07 is missing
@manushak Thanks for taking a look. The current implementation appends to the outputs that already exist in the input manifest, but it does not append to any outputs that exist in a separate output manifest if supplied - that is always overwritten.
So in the example given, if you then supplied append-output.yaml as the input manifest for the second run (having modified the mock parameters) then it would behave as I believe you desired.
If the intent is to append to both outputs on the input manifest and outputs to an existing output manifest (if it exists) I’d need to do some more digging - and probably clarify with you all on other scenarios needing support here.
Hope that makes sense!
Yeah, you're right, @jamescrowley. I'm sorry for the misunderstanding. It works as expected.