firecracker icon indicating copy to clipboard operation
firecracker copied to clipboard

Add --metrics-path option

Open pb8o opened this issue 3 years ago • 4 comments
trafficstars

Signed-off-by: Pablo Barbáchano [email protected]

Reason for This PR

Metrics configuration are not saved in the snapshot since these are host related deployments settings. This means that after resuming, the only way to re-configure the metrics through an API call. This can be costly for some customers since it will add up to the resume latency.

Description of Changes

Add a CLI option to configure metrics

  • [ ] This functionality can be added in rust-vmm.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

PR Checklist

[Author TODO: Meet these criteria.] [Reviewer TODO: Verify that these criteria are met. Request changes if not]

  • [X] All commits in this PR are signed (git commit -s).
  • [X] The issue which led to this PR has a clear conclusion.
  • [X] This PR follows the solution outlined in the related issue.
  • [X] The description of changes is clear and encompassing.
  • [X] Any required documentation changes (code and docs) are included in this PR.
  • [ ] Any newly added unsafe code is properly documented.
  • [ ] Any API changes follow the Runbook for Firecracker API changes.
  • [X] Any user-facing changes are mentioned in CHANGELOG.md.
  • [X] All added/changed functionality is tested.

pb8o avatar Aug 09 '22 16:08 pb8o

Creating a new PR as I somehow messed the previous one.

pb8o avatar Aug 09 '22 16:08 pb8o

Copying @sandreim comments

I would suggest to deprecate the current API and always configure metrics via CLI as per the Firecracker charter:

Minimalist in Features: If it's not clearly required for our mission, we won't build it. We maintain a single implementation per capability, and deprecate obsolete implementations; resolving exceptions is a high priority issue.

Another option is to add the metric configuration as optional on the load snapshot API:

curl --unix-socket /tmp/firecracker.socket -i \
    -X PUT 'http://localhost/snapshot/load' \
    -H  'Accept: application/json' \
    -H  'Content-Type: application/json' \
    -d '{
            "snapshot_path": "./snapshot_file",
            "mem_file_path": "./mem_file",
            "enable_diff_snapshots": true,
            "resume_vm": true,
            "metrics_file": "./here/come/the/metrics"
    }'

This already contains a resume vm field, for the exact same purpose you are adding this CLI param, to avoid startup latency.

pb8o avatar Aug 09 '22 16:08 pb8o

@sandreim Thanks for the comments. Sorry I had to copy the conversation over as I fumbled the previous PR.

I would suggest to deprecate the current API

I think for now, I will leave it as-is. I wouldn't want to break existing customers.

Another option is to add the metric configuration as optional on the load snapshot API:

That's a good option, and it may improve the UX. But I would like to have that in addition to the CLI option. Otherwise we would only start seeing metrics after they are configured, potentially missing early ones. I will talk it with the team and see if there's support for it.

pb8o avatar Aug 09 '22 17:08 pb8o

I would suggest to deprecate the current API

I think for now, I will leave it as-is. I wouldn't want to break existing customers.

Deprecation actually helps not break existing customers by warning them in advance that some breaking change is going to happen.

Also, I think it would be a good idea to move this to the CLI and deprecate the old endpoint. It would be less confusing for all customers to have only one way of setting up metrics in restored VMs, so why not have it be the better way (the purpose of the PR after all 😁). Also, it would be more maintainable for the Firecracker team both in terms of code but also API, since a CLI option falls under the API support policy like any other REST endpoint.

The purpose of the API support policy was that it would be safe to get rid of unnecessary API endpoints without causing customers trouble. My opinion is that it should be used to keep the API as clean and simple as possible.

georgepisaltu avatar Aug 09 '22 20:08 georgepisaltu

@sandreim Thanks for the comments. Sorry I had to copy the conversation over as I fumbled the previous PR.

I would suggest to deprecate the current API

I think for now, I will leave it as-is. I wouldn't want to break existing customers.

Another option is to add the metric configuration as optional on the load snapshot API:

That's a good option, and it may improve the UX. But I would like to have that in addition to the CLI option. Otherwise we would only start seeing metrics after they are configured, potentially missing early ones. I will talk it with the team and see if there's support for it.

I currently have 2 concerns around that:

  • we are targeting to have identical behavior for both logging and metrics (since both of these configurations are host deployment related settings). IF we add metrics to the resume command we also need to add logging. If other such host settings appear in the future, it will not be sustainable to put them all in one API request
  • as Pablo said, we would not be offering functionality parity since we also gather metrics on other components (API requests) which would not be gathered if they are triggered before the resume call

dianpopa avatar Aug 10 '22 11:08 dianpopa

I would suggest to deprecate the current API

I think for now, I will leave it as-is. I wouldn't want to break existing customers.

Deprecation actually helps not break existing customers by warning them in advance that some breaking change is going to happen.

Also, I think it would be a good idea to move this to the CLI and deprecate the old endpoint. It would be less confusing for all customers to have only one way of setting up metrics in restored VMs, so why not have it be the better way (the purpose of the PR after all 😁). Also, it would be more maintainable for the Firecracker team both in terms of code but also API, since a CLI option falls under the API support policy like any other REST endpoint.

The purpose of the API support policy was that it would be safe to get rid of unnecessary API endpoints without causing customers trouble. My opinion is that it should be used to keep the API as clean and simple as possible.

I actually see a nice use-case for keeping the API enabled metrics around . Instead of deprecating the possibility of configuring metrics through the API, I would actually like us to improve on that. I am referring to adding the capability to configure metrics and logging after boot (which we are not offering right now). Enabling this option would give customer the ability to turn them on for misbehaving microVMs or other maybe performance related issues after boot/resume.

dianpopa avatar Aug 10 '22 11:08 dianpopa

🚢

alsrdn avatar Aug 23 '22 12:08 alsrdn