hpc
hpc copied to clipboard
Reference implementations of MLPerf™ HPC training benchmarks
Can we get tables showing what is in the benchmark suite in the landing page? See https://github.com/mlcommons/inference for a great example with all the different rounds.
Steve, please can you send me or check in a yaml file for the small example. For some reason when I try to modify parameters to use the small dataset...
It's too old compared to the gpu one: https://github.com/mlcommons/hpc/blob/main/cosmoflow/builds/Dockerfile.cpu_mpich
It's probably not a common use-case, but the "dummy" wireup method for deepcam doesn't seem to work. Here's an example script at NERSC: ``` #!/bin/bash #SBATCH -A nstaff_g #SBATCH -q...
Details coming soon.
Everything proceeds for the install until I reach this step: ``` $ conda env create -f env.yml Collecting package metadata (repodata.json): done Solving environment: failed ResolvePackageNotFound: - pytorch=1.8.1 - pymatgen=2020.12.31...
The deepcam readme still describes the dependency on an external package for the LR warmup scheduler: https://github.com/mlcommons/hpc/blob/main/deepcam/README.md#before-you-run My understanding is this is no longer needed because the code was updated...
Our benchmark documentation could use a bit of improvement, and I think an initial thing to do would be to harmonize the readme document structure across the benchmarks. Then we...
The top-level readme has outdated information. We should update this and follow the template at https://github.com/mlcommons/training
Weight decay and l2 regularization differs by a factor of 2. (Refs) I think the value of Weight Decay output in the following line should be "l2 * 2". https://github.com/mlcommons/hpc/blob/b796e7aec0339b8a2d33e7af3c875ebe74f038aa/cosmoflow/models/cosmoflow.py#L52...