inference issues

Results 200 inference issues

Sort by recently updated

How to handle failed requests(5XX) of an online inference service?

When I want to test an online inference service by POST requests, it is necessary to record all response, because of [this](https://github.com/mlcommons/inference/blob/268bc9dc8a3c0a96bbb7d38482c0ce5016507633/loadgen/logging.h#L398) However, if the response's status code is not...

yabea

Setting `min_query_count` for GPTJ

Running GPTJ even on accelerated systems can be quite demanding, as the Server latency constraint of 20 seconds suggests. For systems close to this threshold, meeting the minimum run duration...

psyhtest

inference v4.0

Getting a lower than expected accuracy with DLRMv2 using Nvidia code for v3.1

While running the Nvidia code for DLRMv2 on a 4090 GPU with batch size 1400, we are seeing the below accuracy which is lower than expected. Can someone help us...

arjunsuresh

Reintroducing min_query_count for SingleStream (64) and Server (662)

The minimum query count (`min_query_count`) was [removed](https://github.com/mlcommons/inference/commit/995ffee37682a0870f359bb00d5f4672d78d6424) from `mlperf.conf` a while ago. I believe the thinking was that submitters could choose how many samples to process. As long as the...

psyhtest

host_processor_vcpu_count in <system_desc_id>.json

According to https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#57-system_desc_idjson-metadata it's required to provide field host_processor_vcpu_count. I see that relevant checks are missing in https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py so the submitter get `submission_checker.py:3240 WARNING] , field host_processor_vcpu_count is unknown`. My...

szutenberg