transformers
transformers copied to clipboard
Benchmark
What does this PR do?
Benchmark go brrrrrr 🔥
Currently, the results are obtained in a json format. This PR doesn't try to implement how the results being displayed
.
The users are expected to implement the logic of displaying the results in the way(s) as they wish.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
A report for running from_pretrained_benchmark.py
{
"result": [
{
"time": 0.12947426349995794
}
],
"init_kwargs": {},
"run_kwargs": {
"measure_kwargs": {
"number": 2,
"repeat": 3
},
"target_kwargs": {
"model_class": "AutoModel",
"repo_id": "bert-base-uncased"
},
"inputs_kwargs": [
{}
],
"report_kwargs": {
"output_path": "benchmark_report.json"
}
}
}
A report for running cache_benchmark.py
[
{
"result": {
"time": 0.5173940999998194
},
"init_kwargs": {},
"run_kwargs": {
"measure_kwargs": {
"number": 2,
"repeat": 3
},
"target_kwargs": {
"batch_size": 1,
"max_cache_length": 16,
"seq_length": 4,
"cache_type": "static",
"mode": "eager"
},
"inputs_kwargs": {},
"report_kwargs": {
"output_path": "benchmark_report.json"
}
}
},
{
"result": {
"time": 0.4013058944999557
},
"init_kwargs": {},
"run_kwargs": {
"measure_kwargs": {
"number": 2,
"repeat": 3
},
"target_kwargs": {
"batch_size": 1,
"max_cache_length": 16,
"seq_length": 4,
"cache_type": "static",
"mode": "compiled"
},
"inputs_kwargs": {},
"report_kwargs": {
"output_path": "benchmark_report.json"
}
}
},
{
"result": {
"time": 0.5117897099999027
},
"init_kwargs": {},
"run_kwargs": {
"measure_kwargs": {
"number": 2,
"repeat": 3
},
"target_kwargs": {
"batch_size": 2,
"max_cache_length": 16,
"seq_length": 4,
"cache_type": "static",
"mode": "eager"
},
"inputs_kwargs": {},
"report_kwargs": {
"output_path": "benchmark_report.json"
}
}
},
{
"result": {
"time": 0.4497902514999623
},
"init_kwargs": {},
"run_kwargs": {
"measure_kwargs": {
"number": 2,
"repeat": 3
},
"target_kwargs": {
"batch_size": 2,
"max_cache_length": 16,
"seq_length": 4,
"cache_type": "static",
"mode": "compiled"
},
"inputs_kwargs": {},
"report_kwargs": {
"output_path": "benchmark_report.json"
}
}
}
]
@ArthurZucker The PR's main goal is not to make all the (request) features all available in one go. They could be added progressively, like what you mentioned
test with device map, without, with fast init, without.
We need a workflow similar to the ci-important-models that will help you also check if the workflow works as expected.
We need to make sure we also test more than 1 model, from pretrained should test our top 10 used models for example
Regarding
some kind of config were we store what was used to run the test before even running it (instead of kwargs stored).
The passed arguments (positional and keywords) form the configuration that could be used to re-run.
Kwargs are nice but we need explicit configs / explicit arguments, I am not 100% convinced kwargs is the way to go. We could have de PreTrainedConfig
for that matter, or a simple json or whatever, I don't know what is the best!
[Update] You are kind right - so far only the arguments that are specified explicitly will be saved. Let's discuss it when you are back.
Kwargs are nice but we need explicit configs / explicit arguments, I am not 100% convinced kwargs is the way to go. We could have de
PreTrainedConfig
for that matter, or a simple json or whatever, I don't know what is the best!
I don't use kwargs in the definition of the concrete subclasses' methods. Only in the parent class (Benchmark) which is abstract that takes kwargs in the methods, which kind makes sense as they are are meant to be implemented in the concrete subclasses. The run method is kind special which is implemented in the abstract class, but it's role is just to dispatch the inpits to different methods end to end.
Currently the results and configuration are saved in a json file, you can see that in the 2 examples I provided in a previous comments.
If you still have doubt, let's talk to discuss in more detail.
With these kwargs - how are we wanting to test i.e. turning each individual feature on then off, or should there be some sort of combination? e.g. device_map with fast init
We might need to define some way of combining test features
Hi @amyeroberts
As I explained to @ArthurZucker and @LysandreJik
This PR is not to make the concrete (benchmark) classes being feature complete. They are here only to demonstrate the global structure of how we are going to do benchmark.
[Update]
so far, I just add the scripts to utils/not_doctested.txt
to pass CI
https://github.com/huggingface/transformers/pull/28943/commits/ec2a34a5afa7b0bd7b83bb28770944e0cc8858f0
~~Question~~
Should we (I):
- move all these benchmark scripts to
utils/benchmark
- keep the definitions in
src/transformers/benchmark/cache_benchmark.py
but move the__main__
toutils/benchmark
- anything else better? ?
Details
With absolute import
from benchmark_utils_generic import BenchMark, SpeedBenchMark`
This works when we run the script like
python src/transformers/benchmark/cache_benchmark.py
However, the tests_pr_documentation
CI gives
from benchmark_utils_generic import BenchMark, SpeedBenchMark
E ModuleNotFoundError: No module named 'benchmark_utils_generic'
With relative import
from .benchmark_utils_generic import BenchMark, SpeedBenchMark
ImportError: attempted relative import with no known parent package
@amyeroberts @ArthurZucker @LysandreJik Let me know if you still have any comments.
Next step is/are: adding workflow files and/or extend the 2 benchmark scripts to cover more cases. (But let's not go too much, the most important thing is to have something simple run end to end and we can see more clearly how to extend the really necessary stuff)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
#30615 superseeds this! Feel free to close