Pietro Lesci issues

Results 17 issues of


                                            Pietro Lesci

Example using own corpus for training

Hi there, I am trying to train an NER tagger on my own data. In the docs there does not seem to be worked out examples of using flair to...

🐛 LocScaleReparam and enumeration with NUTS

## Context Hi there, While trying to reproduce the [annotators.py](https://github.com/pyro-ppl/numpyro/blob/master/examples/annotation.py) example - originally written in numpyro - in pyro, I've come across what might be a bug. Discussion on pyro...

warnings & errors

Return pytorch tensor for mini-imagenet labels?

https://github.com/learnables/learn2learn/blob/06893e847693a0227d5f35a6e065e6161bb08201/learn2learn/vision/datasets/mini_imagenet.py#L111 Currently, when loading mini-imagenet the inputs are returned as pytorch tensors while the labels as numpy arrays. Since the user will likely use both in a training loop, does...

Support sequence tagging evaluation metrics (NLP)

## 🚀 Feature Support for sequence tagging evaluation metrics _à la_ [`seqeval`](https://github.com/chakki-works/seqeval). That is, support the evaluation of the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, semantic...

enhancement

New metric

waiting on author

Convert MultiWoZ 2.1 to same format as 2.2

Hi there, Is there an easy way to convert the MultiWoZ 2.1 dataset to the same format as the 2.2? Best, Pietro

❓Different results between normal batching and `vmap` while using lower precision (e.g., bfloat16)

### 🐛 Describe the bug ## Description Hi there, I would like to get per-sample gradients (application: transformer classifier for text). in a preliminary test, I noticed that when computing...

Add training loss data

Task description: "Collect all loss values into CSV files from WandB and -- if needed -- log files". The most important file is `pythia_runs.tsv` in which I manually collect the...

[Pythia on Pile-Dedup] Training for ~1.5 epochs: how to identify the repeated sequences (i.e., the additional .5 epoch)?

Hi there, The deduplicated dataset has fewer sequences and to keep a consistent token count with the non-deduplicated version the models are trained for ~1.5 epochs (as discussed in the...

Support for text representation transforms?

Hi there, First of all, thanks for creating this amazing library - investing time and money to support it, and making it open source. I discovered it at JuliaCon 2021...

❓Get stats (e.g. counts) about the merged pairs

Hi there, I was wondering whether there is an easy way to ask the tokeniser trainer to return the counts (or frequency) of the pair in the moment the merge...