haystack icon indicating copy to clipboard operation
haystack copied to clipboard

test: integrate haystack benchmarks with datadog

Open rjanjua opened this issue 2 years ago • 12 comments

Related Issues

  • fixes #4904

Proposed Changes:

Adds a script for preparing the environment Adds a script for running all of the benchmarks Adds a script for sending the benchmark results to datadog

The benchmark runs will be scheduled via a lambda function which will start an ec2 instance and run the setup_benchmarks.sh script.

How did you test it?

Tested it by running the scripts in an ec2 instance

Notes for the reviewer

Checklist

rjanjua avatar Jun 21 '23 07:06 rjanjua

I don't see any change under .github but https://github.com/deepset-ai/haystack/issues/4904 mentions Github Actions, does it mean this PR will not fix the issue?

masci avatar Jun 30 '23 16:06 masci

Let me answer the first two questions

Why was the work done in branch fix-benchmark-workflow ignored?

How is it possible that the benchmarks work if those changes are not included?

This PR builds on top of the changes related to #4902. I decided to take the chance and rewrite the benchmarks from scratch to make their usage easier and more flexible. You can find some details about how to run them in the README file . Let me know if you have any questions about that.

bogdankostic avatar Jul 04 '23 15:07 bogdankostic

Thanks @bogdankostic, I missed those changes.

Nonetheless why are we using AWS Lambdas and not Github Actions?

silvanocerza avatar Jul 04 '23 15:07 silvanocerza

Thanks @bogdankostic, I missed those changes.

Nonetheless why are we using AWS Lambdas and not Github Actions?

After discussions with @steppi91 we decided to go with AWS lambda over github actions due to security concerns. We don't want to expose a job to the public that could expose some of our internal infrastructure to an arbitrary pull request.

rjanjua avatar Jul 06 '23 09:07 rjanjua

@rjanjua we're not going to have that job trigger from any PR so the security concerns don't stand.

The plan is to have it run on manual trigger and/or a schedule, editing that in a PR will have no direct effect on the workflow.

There are no concerns security wise given those limits.

That can only happen if the harmful workflow is merged in main, and that can happen from any PR and at any moment. If some bad actor manages to get an harmful workflow in that means that there was no review done.

silvanocerza avatar Jul 06 '23 14:07 silvanocerza

If a user adds a pull_request trigger to the workflow, then it will trigger the action in the pull request. I tried that in the sdk repo:

Paul is currently on vacation for a few weeks, I am not sure if he had any further concerns. If we need to we could get Ivan's thoughts on the matter.

rjanjua avatar Jul 06 '23 14:07 rjanjua

That only happens cause you have write permissions in the repo. PRs coming from forks of first time contributors won't trigger workflows automatically.

If we have this kind of concerns we must make workflow runs from forks even more strict and always require manual approval.

Even as of now a second time contributor could open a PR that edits a workflow and steal credentials if what you say is true. Though I'm not sure this could be a problem either, if I remember correctly workflows from forks are never run even if modified since the context used is always the one of the "receiving" repo.

silvanocerza avatar Jul 06 '23 15:07 silvanocerza

Thanks for the knowledge on the first time contributors 🙂

From what I understood of this section of the github docs:

Note: Workflows triggered by pull_request_target events are run in the context of the base branch. Since the base branch is considered trusted, workflows triggered by these events will always run..."

My assumption here is that if the trigger is pull_request and not pull_request_target then it will run based on the fork.

Later it also mentions the following, which indicates that the incoming code is what is run:

Inspect the proposed changes in the pull request and ensure that you are comfortable running your workflows on the pull request branch. You should be especially alert to any proposed changes in the .github/workflows/ directory that affect workflow files.

@askainet do you have any recommendations on which approach to take?

rjanjua avatar Jul 06 '23 15:07 rjanjua

Later it also mentions the following, which indicates that the incoming code is what is run:

I think we should give this a try. I admit I might be wrong but I remember modified workflows from forks not running since the event is not declared in the workflow version living in main.

silvanocerza avatar Jul 06 '23 16:07 silvanocerza

I just gave it a try. I was wrong and adding a pull_request trigger on a workflow in a PR from a fork indeed triggers that version.

I also changed the approval setting, now all external contributors will always require it for each contribution.

silvanocerza avatar Jul 07 '23 08:07 silvanocerza

@bogdankostic made me rightly notice that secrets are not passed to forks. This is from the repos Action's Secret Settings.

image

And the GITHUB_TOKEN can't have write permissions on pull_request events from forks, as per documentation.

pull_request_target can have write permissions but that only run if the workflow is in main.

I reverted the change I made as per previous comment, now only first time contributors need approval.

So please, let's go with Github Actions.

silvanocerza avatar Jul 07 '23 09:07 silvanocerza

@bogdankostic made me rightly notice that secrets are not passed to forks. This is from the repos Action's Secret Settings.

And the GITHUB_TOKEN can't have write permissions on pull_request events from forks, as per documentation.

pull_request_target can have write permissions but that only run if the workflow is in main.

I reverted the change I made as per previous comment, now only first time contributors need approval.

So please, let's go with Github Actions.

I think that seems ok then.

I'll make an update to do this in the next few days, and if @steppi91 has an issue once he is back from vacation then we can reassess

rjanjua avatar Jul 10 '23 16:07 rjanjua

superseded by https://github.com/deepset-ai/haystack/pull/5432

masci avatar Aug 07 '23 08:08 masci