More Googleable name for this repo
This dataset of benchmarks is an absolute goldmine and I believe it needs to be known more widely. Unfortunately, its name is not very memorable or Googleable.
Is there another name that could be used that would be easier to find/google/talk about? Unfortunately "WasmBench" hides this repository's value in a bland name.
I agree the name might not be perfect, but I'm not sure about renaming, since it is linked to and referenced in the paper. What do you think, @danleh ?
Creating a PR to add WasmBench to this list may be another idea to increase visibility.
Thanks, Ben, for the suggestion. Any ideas for a better name? Would a longer name that includes "WasmBench" help, e.g., "WasmBench-WebAssembly-Benchmark"?
I was thinking something even more unique. E.g. what about "Sola Wasm Binary Dataset [year]"?
As for not link rotting, maybe you could add the old repo (this one) as a git submodule? The big zips with the binaries in them are part of a GitHub release, and not "checked in" AFAICT? I am not sure where GitHub physically stores those.
Creating a PR to add WasmBench to this list may be another idea to increase visibility.
@hilbigan Good idea! I submitted a PR to this list with a link to this repo and our paper.
I agree that the name is a bit generic. "Bench" also evokes the intuition of "performance testing" a bit much. Maybe something with dataset in it?
@michaelpradel What might help discoverability is also adding a repo description. Could you add something like "A large dataset of real-world WebAssembly binaries, collected from the Web, GitHub, NPM and more sources. Useful for test data, for training machine learning models, or just for fun"?
The big zips with the binaries in them are part of a GitHub release, and not "checked in" AFAICT? I am not sure where GitHub physically stores those.
That's right, they are too large to be under version control. I added two direct links to the dataset (full and filtered) in the beginning of the README, so they are easier accessible.
@michaelpradel What might help discoverability is also adding a repo description. Could you add something like "A large dataset of real-world WebAssembly binaries, collected from the Web, GitHub, NPM and more sources. Useful for test data, for training machine learning models, or just for fun"?
Done.