htmlparser-benchmark icon indicating copy to clipboard operation
htmlparser-benchmark copied to clipboard

New Benchmark

Open victornpb opened this issue 4 years ago • 7 comments
trafficstars

This repository started as a fork from this repo (It doesn't share much code anymore, except from the original HTMLs),

I tried to make it more accurate and also added RAM usage as a metric.

If anyone is interested: https://github.com/victornpb/benchmark-html-parser-libraries

victornpb avatar Jan 18 '21 02:01 victornpb

@victornpb Are you interested in adding another parser (see node-html-parser)? I added in my pr for Andreas' repo already. You could see my pr here.

Dj-Polyester avatar Feb 23 '21 19:02 Dj-Polyester

sure, ill take a look

victornpb avatar Feb 24 '21 01:02 victornpb

Hi @victornpb, very cool project! I just revived this one again, before seeing yours. My main goal was to automatically update results whenever a new version is released, which would prevent things from becoming stale again. All the code is here, but unfortunately @AndreasMadsen would have to enable auto-merges to make it actually work 🙈

It would be super cool to have a benchmark that actually updates itself, without manual intervention. Are you interested in making that a reality with your parser? If so, I'm happy to link to it from the htmlparser2 README.

fb55 avatar Aug 20 '21 21:08 fb55

@fb55 Happy to enable anything needed or transfer the repository etc, although automatic merging seems like it could be abused. I'm obviously not maintaining this. Let me know what you would prefer.

AndreasMadsen avatar Aug 21 '21 13:08 AndreasMadsen

@fb55 I would also like to make this happen. Can you explain how the setup works? Are you using GitHub actions / circle code or something similar?

victornpb avatar Aug 21 '21 16:08 victornpb

Thanks for getting back to me, both of you!

The flow I've implemented here is:

  1. Dependabot will open a pull request whenever there is a new version of a dependency of this repo; see https://github.com/AndreasMadsen/htmlparser-benchmark/blob/e78cd8fc6c2adac08deedd4f274c33537451186b/.github/dependabot.yml
  2. Dependabot PRs will automatically be merged (*) into the repo; see https://github.com/AndreasMadsen/htmlparser-benchmark/blob/e78cd8fc6c2adac08deedd4f274c33537451186b/.github/workflows/dependabot-automerge.yml
  3. Whenever a commit is written to the master branch, the benchmark will run and results will be written to the stats.txt file; see https://github.com/AndreasMadsen/htmlparser-benchmark/blob/e78cd8fc6c2adac08deedd4f274c33537451186b/.github/workflows/nodejs-benchmark.yml

(*) This requires branch protections to be enabled for the master branch, and auto-merge to be enabled for the repo.


The one thing I didn't think of, and which will break this flow, is that GitHub Actions cannot trigger other GitHub Actions; merged Dependabot PRs will not actually update results. That means that (3) will have to be changed:

  1. On all PRs (& pushes), run benchmarks and add results to the PR.

fb55 avatar Aug 23 '21 13:08 fb55

@fb55 I would be fine with auto-merging if it could be restricted to only Dependabot. However, that does not appear possible. The problem with allowing auto-merging without "Require pull request reviews before merging", is that it becomes an easy attack vector for arbitrary code execution. Essentially anybody that passes a simple check can add their own .github/workflows, modify the benchmark script, add a malicious dependency, etc.

If this a compromise you are fine with, I suggest we just transfer the repository to one of your accounts.

AndreasMadsen avatar Aug 24 '21 10:08 AndreasMadsen