Timothee Mickus issues

Results 18 issues of


                                            Timothee Mickus

Remove the need for communicating ready_t

A model component (e.g. the Swahili encoder) is likely to exist on multiple devices. Because each device samples its own task sequence, it is possible that when a gradient synchronization...

enhancement

data state restoration

closes #63 . same idea as v2, didn't bother porting from it. does not embark bucket states, although this could maybe be done by picking the line indices from all...

Currently, the `--train_from` option does not include means of restoring corpora states, hence training resumes from the beginning of the bitexts. This entails resumed models are training on a subset...

enhancement

Opts cleaning

closes #60

Opts need a major cleanup

Going through the existing catalogue of options listed in our docs, a number of them seem to not be plugged in. The list below is most likely not exhaustive. ###...

bug

documentation

enhancement

Encoder only / decoder only models

Currently, we only support training encoder-decoder models. We might want to support encoder-only (e.g. BERT) and decoder-only models (e.g. GPTs). This could be inferred automatically from the types of sharing...

enhancement

good first issue

Prefix / prompt learning with mammoth

Add a feature to learn virtual embeddings for prompt/prefix learning on a pretrained model. This would depend on #24 being implemented first.

enhancement

External dependencies for layer architectures

Currently, we rely on custom-made layer / encoder definitions for our modules. Cf. for instance this class: https://github.com/Helsinki-NLP/mammoth/blob/c6a193b1cc16bf7140520c44712bcf82701ec87d/mammoth/modules/transformer_encoder.py#L13 This entails that any architectural variant we wish to test has to...

enhancement

good first issue

partial training of system

freezing some of the modules would allow training adapter as actual adapters. Ideally, this would entail introducing some mechanism to mark in the config specific layerstacks/adapaters as not requiring gradient....

enhancement

good first issue

max length padding/truncating

Load unbalance is a very likely candidates for the scaling issues we faced. This PR introduces a couple new flags to enforce equal load across nodes, which seems to result...