Pablo Gonzalez

Results 41 issues of Pablo Gonzalez

Reflect changes made in #1254

I tried to run the DLRM reference implementation to test another issue, and got the following error: Commands (in a machine without tensorboard installed): ``` cm pull repo octoml@ck cm...

- [x] Add v3.1 config - [x] Add DLRMv2 checks - [x] Add GPT-J checks, with additional check for generation length (multiple metric) - [x] Update to the last power...

Currently TEST04 consists of running performance mode with only one sample as follows: - Offline scenario, the same sample is repeated as many times as necessary to fill the query...

The training WG asked that we output a clearer error for variables that changed their names: For example: - [Here](https://github.com/mlcommons/logging/pull/290): `opt_epsilon->opt_lamb_epsilon` - [Here](https://github.com/mlcommons/logging/issues/276): `train_samples->dataset_ train_samples` and `train_samples-> total_train_samples`

This error occurs: ``` Traceback (most recent call last): File "/opt/python3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/python3.9/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/mlperf_logging/result_summarizer/__main__.py", line...

I am currently making the reference implementation and am stuck deploying the model in multiple GPUs. Here is a link to the PR: https://github.com/mlcommons/inference/pull/1373 Here is a link to the...

Update the rules to match postmortem proposal: https://docs.google.com/document/d/1jt7ri2jzah4aFrbP0HdptPwetmIra1vrkHctJtAxmdM/edit?usp=sharing