algorithmic-efficiency issues

Add code for self-reporting

2

## Description We need to check whether the user experience of self-reporting is clear and easy. This might require additional code or scripts to make self-reporting as easy as possible....

fsschneider

Add workload variants

2

Add workload variants for the base workloads. This is a tracking issue. 7/8 variants along with model-diff tests have been added already. Remaining work is to: - [x] Submit DeepSpeech...

priyakasimbeg

LibriSpeech Conformer Workload OOMs or NCCL Errors When Run With Multiple Trials

9

We consistently observe an OOM error when running the one of the NAdamW baselines on LibriSpeech Conformer with multiple trials in PyTorch on 8 V100s with 16GB each. This is...

hjmshi

🔥 PyTorch

Two copies of `criteo_resnet_pytorch` exist in .github/workflows/regression_tests_variants.yml

1

Two copies of `criteo_resnet_pytorch` exist in .github/workflows/regression_tests_variants.yml Is this intentional? If not, which is the correct version? Thanks!! ``` criteo_resnet_pytorch: runs-on: self-hosted needs: build_and_push_pytorch_docker_image steps: - uses: actions/checkout@v2 - name:...

tfaod

Publish md5 hashes of datasets

3

## Description Is it possible to publish file hashes and directory layouts for all datasets, post processing. I would like to run some checks to ensure that there are no...

adefazio

[do not merge] Random utils fixes

1

Do not merge this before changing base to dev. Running integration tests with these fixes.

priyakasimbeg

[do not merge] PR to trigger shampoo tests

1

priyakasimbeg

Shampoo conformer workload hangs

4

The conformer workload hangs when run with shampoo training algorithm. ## Description Traceback ``` I0505 23:26:00.158526 139795269302080 submission_runner.py:319] Starting training loop. I0505 23:26:00.373614 139795269302080 input_pipeline.py:20] Loading split = train-clean-100 I0505...

priyakasimbeg

Add tests for scoring code

Add unit and integration tests to test the following requirements: In both strict=False and strict=True, to receive a finite score for a workload a submission must: - Reach the validation...

priyakasimbeg

Add dataset setup tests

## Description Most of the code in data_setup.py is untested. There are a few challenges for these tests: - datasets are very large (total just under 2TB total I believe)...

priyakasimbeg

algorithmic-efficiency
algorithmic-efficiency copied to clipboard

Metadata

Add code for self-reporting

Add workload variants

LibriSpeech Conformer Workload OOMs or NCCL Errors When Run With Multiple Trials

Two copies of `criteo_resnet_pytorch` exist in .github/workflows/regression_tests_variants.yml

Publish md5 hashes of datasets

[do not merge] Random utils fixes

[do not merge] PR to trigger shampoo tests

Shampoo conformer workload hangs

Add tests for scoring code

Add dataset setup tests

← Metadata

Owner

Metadata

algorithmic-efficiency algorithmic-efficiency copied to clipboard

Metadata

← Metadata

Owner

Metadata

algorithmic-efficiency
algorithmic-efficiency copied to clipboard