John Giorgi issues

Results 64 issues of


                                            John Giorgi

Convert all corpora to Standoff

Convert all corpora to Standoff and add them to the `standoff` directory.

enhancement

Support generation kwargs within Seq2SeqTasks

## 🚀 Feature [Seq2Seq tasks](https://github.com/Lightning-AI/lightning-flash/blob/master/flash/text/seq2seq/core/model.py) tasks (and tasks that inherit from it like [`SummarizationTask`](https://github.com/Lightning-AI/lightning-flash/blob/master/flash/text/seq2seq/summarization/model.py)) only allow a user to specify a couple of arguments to `model.generate` https://github.com/Lightning-AI/lightning-flash/blob/651e85851509fd04f723caedfef8d487d77df4e0/flash/text/seq2seq/core/model.py#L139-L144 however, the [`generate`](https://huggingface.co/docs/transformers/v4.26.0/en/main_classes/text_generation#transformers.GenerationMixin.generate)...

enhancement

help wanted

chore: drop python3.6 from build config

Looks like python3.6 is no longer supported by GitHub actions. Drop it from the build config and add python3.9 in its place.

Unable to use BLEURT in offline mode

### Describe the bug Trying to use [BLEURT](https://github.com/huggingface/datasets/tree/main/metrics/bleurt) in offline mode fails. The script and model weights are cached to disk fine (when in _online_ mode). In _offline_ mode, it...

Can unsloth models be compiled by TensorRT-LLM?

Hi! I am wondering if there are any modifications made by `FastLanguageModel ` that would cause problems when compiling a model with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM)? I suppose I can try myself (and...

Adding SOTA badge for CoNLL04

I am currently working on the same dataset, so I went looking for SOTA performance and I am fairly confident your model achieves it. I added the macro-f1 metric to...

Train on BioRED

Try training on BioRED (https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/)

Add a unit test to check that we can overfit a single example

Ideally, the [unit test for our model](https://github.com/JohnGiorgi/seq2rel/blob/d91ccaa7df1cc6345ec04f6aaf4e2401de9397e2/tests/models/test_copynet_seq2rel.py#L22) would check that the model can memorize a single training example. This is technically possible by providing the argument `metric_to_check` to [`ensure_model_can_train_save_and_load`](https://docs.allennlp.org/main/api/common/testing/model_test_case/#ensure_model_can_train_save_and_load). However,...

Automatically detect special tokens

enhancement

Formalize schema of deserialized output with Pydantic

Right now, we have a deserialized output format that looks like: ```python [ { "ADE": [ (("fenoprofen", "DRUG"), ("pure red cell aplasia", "EFFECT")) ] } ] ``` it would be...

enhancement