inference icon indicating copy to clipboard operation
inference copied to clipboard

Reference implementations of MLPerf™ inference benchmarks

Results 200 inference issues
Sort by recently updated
recently updated
newest added

When I want to test an online inference service by POST requests, it is necessary to record all response, because of [this](https://github.com/mlcommons/inference/blob/268bc9dc8a3c0a96bbb7d38482c0ce5016507633/loadgen/logging.h#L398) However, if the response's status code is not...

Running GPTJ even on accelerated systems can be quite demanding, as the Server latency constraint of 20 seconds suggests. For systems close to this threshold, meeting the minimum run duration...

inference v4.0

While running the Nvidia code for DLRMv2 on a 4090 GPU with batch size 1400, we are seeing the below accuracy which is lower than expected. Can someone help us...

The minimum query count (`min_query_count`) was [removed](https://github.com/mlcommons/inference/commit/995ffee37682a0870f359bb00d5f4672d78d6424) from `mlperf.conf` a while ago. I believe the thinking was that submitters could choose how many samples to process. As long as the...

According to https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#57-system_desc_idjson-metadata it's required to provide field host_processor_vcpu_count. I see that relevant checks are missing in https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py so the submitter get `submission_checker.py:3240 WARNING] , field host_processor_vcpu_count is unknown`. My...

Many thanks to MLPerf submitters and MLCommons members for their feedback during the past 2 weeks to help us improve the [MLCommons CM automation for MLPerf inference](https://github.com/mlcommons/ck): ## Improvements and...

Hi! I've been delving into the DLRMv2 benchmark, and I want to confirm my understanding of the scenarios. For the Server scenario, my understanding is that it runs this command...

Adding MIGraphX version of the BERT reference code for use on AMD GPUs that support ROCm. Credit to Zixian Wang from UCSD for helping validating and testing the code as...

I modified pytorch_SUT.py, onnxruntime_SUT.py, squad_QSL.py, and run.py for BERT inference. They are able to run using multiple AMD GPUs as long as the right environment is set up. Credit to...

Edge doesn't contain "Server", and the metric should be latency, not Queries/s. Could you correct this? ![image](https://github.com/mlcommons/inference/assets/6924448/02c88f64-2073-495a-bcc2-f6d37ba21938)