transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Benchmark

Open ydshieh opened this issue 1 year ago • 10 comments

What does this PR do?

Benchmark go brrrrrr 🔥

Currently, the results are obtained in a json format. This PR doesn't try to implement how the results being displayed.

The users are expected to implement the logic of displaying the results in the way(s) as they wish.

ydshieh avatar Feb 09 '24 17:02 ydshieh

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

A report for running from_pretrained_benchmark.py

 {
    "result": [
        {
            "time": 0.12947426349995794
        }
    ],
    "init_kwargs": {},
    "run_kwargs": {
        "measure_kwargs": {
            "number": 2,
            "repeat": 3
        },
        "target_kwargs": {
            "model_class": "AutoModel",
            "repo_id": "bert-base-uncased"
        },
        "inputs_kwargs": [
            {}
        ],
        "report_kwargs": {
            "output_path": "benchmark_report.json"
        }
    }
}

A report for running cache_benchmark.py

[
    {
        "result": {
            "time": 0.5173940999998194
        },
        "init_kwargs": {},
        "run_kwargs": {
            "measure_kwargs": {
                "number": 2,
                "repeat": 3
            },
            "target_kwargs": {
                "batch_size": 1,
                "max_cache_length": 16,
                "seq_length": 4,
                "cache_type": "static",
                "mode": "eager"
            },
            "inputs_kwargs": {},
            "report_kwargs": {
                "output_path": "benchmark_report.json"
            }
        }
    },
    {
        "result": {
            "time": 0.4013058944999557
        },
        "init_kwargs": {},
        "run_kwargs": {
            "measure_kwargs": {
                "number": 2,
                "repeat": 3
            },
            "target_kwargs": {
                "batch_size": 1,
                "max_cache_length": 16,
                "seq_length": 4,
                "cache_type": "static",
                "mode": "compiled"
            },
            "inputs_kwargs": {},
            "report_kwargs": {
                "output_path": "benchmark_report.json"
            }
        }
    },
    {
        "result": {
            "time": 0.5117897099999027
        },
        "init_kwargs": {},
        "run_kwargs": {
            "measure_kwargs": {
                "number": 2,
                "repeat": 3
            },
            "target_kwargs": {
                "batch_size": 2,
                "max_cache_length": 16,
                "seq_length": 4,
                "cache_type": "static",
                "mode": "eager"
            },
            "inputs_kwargs": {},
            "report_kwargs": {
                "output_path": "benchmark_report.json"
            }
        }
    },
    {
        "result": {
            "time": 0.4497902514999623
        },
        "init_kwargs": {},
        "run_kwargs": {
            "measure_kwargs": {
                "number": 2,
                "repeat": 3
            },
            "target_kwargs": {
                "batch_size": 2,
                "max_cache_length": 16,
                "seq_length": 4,
                "cache_type": "static",
                "mode": "compiled"
            },
            "inputs_kwargs": {},
            "report_kwargs": {
                "output_path": "benchmark_report.json"
            }
        }
    }
]

ydshieh avatar Mar 06 '24 04:03 ydshieh

@ArthurZucker The PR's main goal is not to make all the (request) features all available in one go. They could be added progressively, like what you mentioned

test with device map, without, with fast init, without.

We need a workflow similar to the ci-important-models that will help you also check if the workflow works as expected.

We need to make sure we also test more than 1 model, from pretrained should test our top 10 used models for example

Regarding

some kind of config were we store what was used to run the test before even running it (instead of kwargs stored).

The passed arguments (positional and keywords) form the configuration that could be used to re-run.

ydshieh avatar Mar 07 '24 14:03 ydshieh

Kwargs are nice but we need explicit configs / explicit arguments, I am not 100% convinced kwargs is the way to go. We could have de PreTrainedConfig for that matter, or a simple json or whatever, I don't know what is the best!

ArthurZucker avatar Mar 08 '24 01:03 ArthurZucker

[Update] You are kind right - so far only the arguments that are specified explicitly will be saved. Let's discuss it when you are back.

Kwargs are nice but we need explicit configs / explicit arguments, I am not 100% convinced kwargs is the way to go. We could have de PreTrainedConfig for that matter, or a simple json or whatever, I don't know what is the best!

I don't use kwargs in the definition of the concrete subclasses' methods. Only in the parent class (Benchmark) which is abstract that takes kwargs in the methods, which kind makes sense as they are are meant to be implemented in the concrete subclasses. The run method is kind special which is implemented in the abstract class, but it's role is just to dispatch the inpits to different methods end to end.

Currently the results and configuration are saved in a json file, you can see that in the 2 examples I provided in a previous comments.

If you still have doubt, let's talk to discuss in more detail.

ydshieh avatar Mar 08 '24 06:03 ydshieh

With these kwargs - how are we wanting to test i.e. turning each individual feature on then off, or should there be some sort of combination? e.g. device_map with fast init

We might need to define some way of combining test features

Hi @amyeroberts

As I explained to @ArthurZucker and @LysandreJik

This PR is not to make the concrete (benchmark) classes being feature complete. They are here only to demonstrate the global structure of how we are going to do benchmark.

ydshieh avatar Mar 13 '24 15:03 ydshieh

[Update]

so far, I just add the scripts to utils/not_doctested.txt to pass CI

https://github.com/huggingface/transformers/pull/28943/commits/ec2a34a5afa7b0bd7b83bb28770944e0cc8858f0

~~Question~~

Should we (I):

  • move all these benchmark scripts to utils/benchmark
  • keep the definitions in src/transformers/benchmark/cache_benchmark.py but move the __main__ to utils/benchmark
  • anything else better? ?

Details

With absolute import

from benchmark_utils_generic import BenchMark, SpeedBenchMark`

This works when we run the script like

python src/transformers/benchmark/cache_benchmark.py

However, the tests_pr_documentation CI gives

    from benchmark_utils_generic import BenchMark, SpeedBenchMark
E   ModuleNotFoundError: No module named 'benchmark_utils_generic'

With relative import

from .benchmark_utils_generic import BenchMark, SpeedBenchMark

ImportError: attempted relative import with no known parent package

ydshieh avatar Mar 18 '24 08:03 ydshieh

@amyeroberts @ArthurZucker @LysandreJik Let me know if you still have any comments.

Next step is/are: adding workflow files and/or extend the 2 benchmark scripts to cover more cases. (But let's not go too much, the most important thing is to have something simple run end to end and we can see more clearly how to extend the really necessary stuff)

ydshieh avatar Mar 18 '24 10:03 ydshieh

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 10 '24 08:05 github-actions[bot]

#30615 superseeds this! Feel free to close

ArthurZucker avatar May 10 '24 08:05 ArthurZucker