if icon indicating copy to clipboard operation
if copied to clipboard

Add `--append` feature to IF

Open jmcook1186 opened this issue 1 year ago β€’ 10 comments

What Sub of #764 Add an --append mode to IF that takes a manifest with outputs and, instead of overwriting the outputs, adds new timesteps to them.

Why Enables IF to be run continuously or as batch jobs and still yield a single output manifest.

Context

We want people to be able to have intermittent IF runs that append output data to a file rather than each independent run overwriting the outputs section.

The way this would work is if that an importer in the observe pipeline is configured to grab data using a relative time definition such as latest or daily, meaning the timestamps are not hardcoded into the manifest, but are inferred from the time of execution. In this case, the same manifest would return data for a different time range each time it is executed, and each new set of data would overwrite what was there before. What we would like to do instead, is to add the --append tag to the CLI to configure IF to add new inputs and outputs to the manifest instead of overwriting them.

--append can have some firm boundaries at this stage to make the feature simpler to build, for example:

  • the first timestep in new batches of inputs must be later than the last timestep in the previous batch (i.e. you cannot append overlapping time series)
  • do not check that the pipelines are consistent between runs (if-check would be able to pick up discrepancies post-hoc anyway as the given inputs would not lead to the given outputs for earlier batches if the pipeline was updated)

Prerequisites/resources none

SoW (scope of work)

  • [ ] --append is added to IF
  • [ ] documentation updated
  • [ ] test cases added

Acceptance criteria

  • [ ] --append adds new data to an existing output file

    GIVEN I have the following manifest

name: mock-append
description: 
initialize:
  plugins:
    mock-observations:
      path: builtin
      method: MockObservations
      global-config:
        timestamp-from: '2024-03-05T00:00:00.000Z'
        timestamp-to: '2024-03-05T00:00:03.000Z'
        duration: 1
        components:
          - name: server-1
            cloud/instance-type: Standard_E64_v3
            cloud/region: westus3
        generators:
          common:
            cloud/vendor: azure
          randint:
            cpu/energy:
              min: 1
              max: 99
            mem/energy:
              min: 1
              max: 99
    sum:
      path: builtin
      method: Sum
      global-config:
        input-parameters:
          - cpu/energy
          - mem/energy
        output-parameter: energy
execution:
  command: >-
    /home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/user/Code/if/src/index.ts -m
    manifests/examples/mock-cpu-util-to-carbon.yml -s
  environment:
    if-version: 0.4.0
    os: linux
    os-version: 5.15.0-107-generic
    node-version: 21.4.0
    date-time: 2024-06-18T14:18:44.864Z (UTC)
    dependencies:
      - '@babel/[email protected]'
      - '@babel/[email protected]'
      - '@commitlint/[email protected]'
      - '@commitlint/[email protected]'
      - '@grnsft/[email protected]'
      - '@jest/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
  status: success
tree:
  pipeline:
    - mock-observations
    - sum
  defaults: null
  config:
    group-by:
      group:
        - cloud/region
        - name
  inputs:
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
  outputs:
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
      energy: 15
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
      energy: 76
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110

AND now I open this file and update the timestamps int he mock observation config so they are more recent, without removing any of the inputs or outputs

name: mock-cpu-util-to-carbon
description: >-
  a complete pipeline that starts with mocked CPU utilization data and outputs
  operational carbon in gCO2eq
initialize:
  plugins:
    mock-observations:
      path: builtin
      method: MockObservations
      global-config:
        timestamp-from: '2024-03-05T00:00:04.000Z'
        timestamp-to: '2024-03-05T00:00:07.000Z'
        duration: 1
        components:
          - name: server-1
            cloud/instance-type: Standard_E64_v3
            cloud/region: westus3
        generators:
          common:
            cloud/vendor: azure
          randint:
            cpu/energy:
              min: 1
              max: 99
            mem/energy:
              min: 1
              max: 99
    sum:
      path: builtin
      method: Sum
      global-config:
        input-parameters:
          - cpu/energy
          - mem/energy
        output-parameter: energy
execution:
  command: >-
    /home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/user/Code/if/src/index.ts -m
    manifests/examples/mock-cpu-util-to-carbon.yml -s
  environment:
    if-version: 0.4.0
    os: linux
    os-version: 5.15.0-107-generic
    node-version: 21.4.0
    date-time: 2024-06-18T14:18:44.864Z (UTC)
    dependencies:
      - '@babel/[email protected]'
      - '@babel/[email protected]'
      - '@commitlint/[email protected]'
      - '@commitlint/[email protected]'
      - '@grnsft/[email protected]'
      - '@jest/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
  status: success
tree:
  pipeline:
    - mock-observations
    - sum
  defaults: null
  config:
    group-by:
      group:
        - cloud/region
        - name
  inputs:
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
  outputs:
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
      energy: 15
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
      energy: 76
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110

WHEN I run the manifest with if-run -m manifest.yml --append

THEN if I open manifest.yml it contains the following:

name: mock-cpu-util-to-carbon
description: >-
  a complete pipeline that starts with mocked CPU utilization data and outputs
  operational carbon in gCO2eq
initialize:
  plugins:
    mock-observations:
      path: builtin
      method: MockObservations
      global-config:
        timestamp-from: '2024-03-05T00:00:04.000Z'
        timestamp-to: '2024-03-05T00:00:07.000Z'
        duration: 1
        components:
          - name: server-1
            cloud/instance-type: Standard_E64_v3
            cloud/region: westus3
        generators:
          common:
            cloud/vendor: azure
          randint:
            cpu/energy:
              min: 1
              max: 99
            mem/energy:
              min: 1
              max: 99
    sum:
      path: builtin
      method: Sum
      global-config:
        input-parameters:
          - cpu/energy
          - mem/energy
        output-parameter: energy
execution:
  command: >-
    /home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/user/Code/if/src/index.ts -m
    manifests/examples/mock-cpu-util-to-carbon.yml -s
  environment:
    if-version: 0.4.0
    os: linux
    os-version: 5.15.0-107-generic
    node-version: 21.4.0
    date-time: 2024-06-18T14:18:44.864Z (UTC)
    dependencies:
      - '@babel/[email protected]'
      - '@babel/[email protected]'
      - '@commitlint/[email protected]'
      - '@commitlint/[email protected]'
      - '@grnsft/[email protected]'
      - '@jest/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
  status: success
tree:
  pipeline:
    - mock-observations
    - sum
  defaults: null
  config:
    group-by:
      group:
        - cloud/region
        - name
  inputs: 
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
    - timestamp: '2024-03-05T00:00:03.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
    - timestamp: '2024-03-05T00:00:04.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
    - timestamp: '2024-03-05T00:00:05.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
    - timestamp: '2024-03-05T00:00:06.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
  outputs:
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
      energy: 15
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
      energy: 76
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110
    - timestamp: '2024-03-05T00:00:03.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110
    - timestamp: '2024-03-05T00:00:04.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110
    - timestamp: '2024-03-05T00:00:05.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110
    - timestamp: '2024-03-05T00:00:06.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110

jmcook1186 avatar Jun 18 '24 14:06 jmcook1186

@jawache please review the AC

zanete avatar Jun 27 '24 10:06 zanete

@jawache @zanete I’d be happy to take this one if useful, let me know

jamescrowley avatar Jul 21 '24 08:07 jamescrowley

@jamescrowley that's great to hear, let me tag @jmcook1186 so he is aware and can comment if there's anything standing in the way πŸ™

zanete avatar Jul 22 '24 12:07 zanete

Hi @jamescrowley - yes, please go for it - thanks!

jmcook1186 avatar Jul 22 '24 14:07 jmcook1186

Hi @jamescrowley, I hope you’re doing great! I just wanted to check in and see how you're doing with this feature. Please feel free to share any updates or questions you have for us to discuss! πŸ™

zanete avatar Jul 30 '24 08:07 zanete

@zanete I'm starting with some integration tests set up to capture the requirement defined above. However, several of the current integration tests do not pass locally. For example

Executing `aggregate.yaml`
βœ– Files do not match!
tree.children.application.children.uk-west.children.server-1.aggregated.cpu/utilization
source: 148
target: 74
Executing `mock-obs-time-sync.yaml`
βœ– Files do not match!
tree.children.child-1.outputs.0.cloud/instance-type
source: NaN
target: A1
Executing `success.yaml`
βœ– Files do not match!
tree.children.child.outputs.0.duration
source: exists
target: missing
Executing `failure-not-matching-with-regex.yaml`
βœ– Files do not match!
execution.status
source: success
target: fail
Executing `success.yml.yaml`
βœ– [2024-07-25 02:35:13.797 PM] error:   ENOENT: no such file or directory, open 'if/manifests/outputs/plugins/sci/re-success.yml.yaml'
Error:  ENOENT: no such file or directory, open 'if/manifests/outputs/plugins/sci/re-success.yml.yaml'

---------
Check summary:
52 of 61 files are passed.

Are these expected? I noticed in the CI set up they don't run on every PR but only for a release?

jamescrowley avatar Aug 01 '24 12:08 jamescrowley

@jamescrowley thanks so much for the update! Let me tag @narekhovhannisyan who can hopefully shed some light on your question. πŸ™

zanete avatar Aug 01 '24 15:08 zanete

@zanete @jmcook1186 Two questions:

  1. In the case of an aggregation and group bys, would you expect these to be applied over the combined (pre-existing and new) outputs, or only on the new outputs? I'm assumingthe former, but let me know your thoughts.

  2. The examples given in the issue has

timestamp-from: '2024-03-05T00:00:04.000Z'

for the 're run'. I assume that was a typo, and timestamp-from: should be '2024-03-05T00:00:03.000Z' in order to get the output described in the example, but let me know if I've missed a nuance somewhere.

jamescrowley avatar Aug 03 '24 07:08 jamescrowley

I've pushed a rough draft here: https://github.com/Green-Software-Foundation/if/pull/932 for discussion to ensure it's along the lines of what you had in mind?

jamescrowley avatar Aug 04 '24 08:08 jamescrowley

Thanks so much @jamescrowley , let's get @jmcook1186 and @narekhovhannisyan to take a look, please πŸ™

zanete avatar Aug 05 '24 12:08 zanete