algorithmic-efficiency issues

Issue 452: Added the assertion for consistency check and evaluation frequency check

1

For this PR, I added two assertions to validate the timing and evaluation consistency at the end of the training loop: 1) Duration consistency check: The total duration of training...

harneet862

BN Fixes

3

There are some subtle issues with how BatchNorm is handled in the PyTorch version of the code. Currently, `workload.model_fn` has an `update_batch_norm` parameter, which in theory should allow the submission...

adefazio

Add function that submissions can call that can change the dropout value

Our current API has 2 dropout related limitations: 1. Currently, in the external tuning ruleset we read the dropout value from the hparam config and pass it to the model...

priyakasimbeg

Document the default perf-workload dropout value

Add technical documentation for default behavior in both rulesets.

priyakasimbeg

Inform submission about `accumulated_submission_time`

## Description Currently, `update_params` has no up-to-date information about the elapsed time since start. My motivation for adding this feature is to simplify the implementation of a _time-based learning rate...

Niccolo-Ajroldi

Added instructions to install Cloud Ops Agent for monitoring

1

Updated the CONTRIBUTING readme to include detailed steps for installing and configuring the Google Cloud Ops Agent. This will help users set up monitoring for their VM logs.

BharatKatyal

fix contrast() transform

1

See https://github.com/google-research/big_vision/issues/109, fix suggested by @yeqingli in https://github.com/tensorflow/models/pull/11219#pullrequestreview-2355525720. In short, the original implementation of the contrast() transform which is copied 4-5+ times is broken: What is meant to be the...

EIFY

[WIP] Migrate JAX workloads from pmap to jit

1

## Purpose The goal of this PR is to allow model parameter and optimizer state sharding, and also to migrate the JAX code from using jax.pmap to using jax.jit. ##...

priyakasimbeg

Jit switch for ogbg

2

I added some code to the ogbg workload according to the migration guide helpfully provided by Ahmed. Unfortunately, there seems to be some bugs that I still hope to fix.

davidtweedle

Added test for Evaluating Timing Consistency in MNIST Workload Training for PyTorch and JAX and changed the int32 to uint32

1

A test is added to evaluate timing consistency in the MNIST training workload using PyTorch and JAX. It ensures that the total reported training time matches the sum of submission,...

harneet862

algorithmic-efficiency
algorithmic-efficiency copied to clipboard

Metadata

Issue 452: Added the assertion for consistency check and evaluation frequency check

BN Fixes

Add function that submissions can call that can change the dropout value

Document the default perf-workload dropout value

Inform submission about `accumulated_submission_time`

Added instructions to install Cloud Ops Agent for monitoring

fix contrast() transform

[WIP] Migrate JAX workloads from pmap to jit

Jit switch for ogbg

Added test for Evaluating Timing Consistency in MNIST Workload Training for PyTorch and JAX and changed the int32 to uint32

← Metadata

Owner

Metadata

algorithmic-efficiency algorithmic-efficiency copied to clipboard

Metadata

← Metadata

Owner

Metadata

algorithmic-efficiency
algorithmic-efficiency copied to clipboard