Niccolò Ajroldi issues

Results 12 issues of


                                            Niccolò Ajroldi

Support for LayerNorm

I was trying to extend a Vision Transformer model using backpack. However, I encounter the following error: > UserWarning: Extension saving to grad_batch does not have an extension for Module...

good first issue

[App]: regex does not match all metrics

### Current Behavior At each iteration of my training routine, I am logging 30 metrics, all starting with the same name ("signal-0", "signal-1", ..., "signal-29"). I would like to create...

app

Requeueing on timeouts when launching jobs with CommandFunction

Is it possible to submit a job to slurm with `submitit.helpers.CommandFunction` and `submitit.AutoExecutor`, in such a way that it is _requeued on timeouts_? As mentioned in the [docs](https://github.com/facebookincubator/submitit/blob/main/docs/checkpointing.md), a Python...

Angle between consecutive gradients computation

I am trying to reproduce the results of this work, and I encountered a methodological issue regarding angles computation. The tangent of the **angle between consecutive gradients** is here computed...

Resnet DDP Warning: grad strides do not match bucket view strides

## Description On `imagenet_resnet` workload, I encounter the following warning when running with DDP and `pytorch` framework. > /u/najroldi/miniconda3/envs/alpe/lib/python3.8/site-packages/torch/autograd/init.py:251: UserWarning: Grad strides do not match bucket view strides. This may...

Inform submission about evaluation step

**tl;dr**: We should let the submission know if an evaluation is going to happen at the current step or not. ## Description Currently, there is no easy way for the...

✨ Feature Request

Future Version

Skip eval on train and test for self-reporting results

## Feature request: allow users to skip eval on train and test Evaluating on the training and test sets is time-consuming and not necessary for self-reporting results. We should add...

✨ Feature Request

Good First Issue

Why baseline cumulative hazard for prediction?

I have noticed that `TimeVaryingCox.predict` and `TimeVaryingCox.predict_at_time` compute the predicted hazard score using the baseline **cumulative** hazard function: ``` c_0 = interpolate_at_times(self.model.baseline_cumulative_hazard_, times_to_evaluate_at).T #

Wrong return type in librispeech model_fn

In `librispeech_conformer` the `model_fn` returns `logits_batch` as a Tuple of tensors, not a tensor. The return type is hence wrong: https://github.com/mlcommons/algorithmic-efficiency/blob/ddf5efc4e13a9a4e620ad719e9bf42303f064fac/algorithmic_efficiency/workloads/librispeech_conformer/librispeech_pytorch/workload.py#L119 It should be: ``` def model_fn(...) -> Tuple[Tuple[spec.Tensor, spec.Tensor],...

Introduce prepare for eval, fix evaluation bug

## Description This pull request introduces a `prepare_for_eval` function and updates the code to support it. The implementation follows the blueprint of @fsschneider in https://github.com/mlcommons/algorithmic-efficiency/issues/719#issuecomment-2328797610 and fixes the bug of...