hpc icon indicating copy to clipboard operation
hpc copied to clipboard

Reference implementations of MLPerf™ HPC training benchmarks

Results 14 hpc issues
Sort by recently updated
recently updated
newest added

Can we get tables showing what is in the benchmark suite in the landing page? See https://github.com/mlcommons/inference for a great example with all the different rounds.

Steve, please can you send me or check in a yaml file for the small example. For some reason when I try to modify parameters to use the small dataset...

It's too old compared to the gpu one: https://github.com/mlcommons/hpc/blob/main/cosmoflow/builds/Dockerfile.cpu_mpich

It's probably not a common use-case, but the "dummy" wireup method for deepcam doesn't seem to work. Here's an example script at NERSC: ``` #!/bin/bash #SBATCH -A nstaff_g #SBATCH -q...

Everything proceeds for the install until I reach this step: ``` $ conda env create -f env.yml Collecting package metadata (repodata.json): done Solving environment: failed ResolvePackageNotFound: - pytorch=1.8.1 - pymatgen=2020.12.31...

The deepcam readme still describes the dependency on an external package for the LR warmup scheduler: https://github.com/mlcommons/hpc/blob/main/deepcam/README.md#before-you-run My understanding is this is no longer needed because the code was updated...

Our benchmark documentation could use a bit of improvement, and I think an initial thing to do would be to harmonize the readme document structure across the benchmarks. Then we...

The top-level readme has outdated information. We should update this and follow the template at https://github.com/mlcommons/training

Weight decay and l2 regularization differs by a factor of 2. (Refs) I think the value of Weight Decay output in the following line should be "l2 * 2". https://github.com/mlcommons/hpc/blob/b796e7aec0339b8a2d33e7af3c875ebe74f038aa/cosmoflow/models/cosmoflow.py#L52...