Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

[`pipeline`] Add conditional text support for `ImageToTextPipeline`

Is there any difference in the code between vqa and captionning ? In general, pipelines are defined by I/O (input/output meaning (image, text) -> (text)). The rest is semantics, naming...

[`pipeline`] Add conditional text support for `ImageToTextPipeline`

> however I wonder if that doesn't make the VQA pipeline obsolete for those models Why does it ? You said above that both were correct ? Did I misunderstand...

Allowing adding new token as unk token for gpt2 tokenizer

I think we should add some tests to clarify what behavior is modified an how. It could be for just those 4 tokenizers, but still I think the effect of...

error with protoBug in v4.27.3

@sgugger Didn't we upgrade the protobuf generated file in the end ? Also this happens to be a Camembert, which is BPE + spm (so subject to the bug we...

error with protoBug in v4.27.3

Created this: https://github.com/huggingface/transformers/pull/23013 I'll try to run slow tests of tokenization on some machine in addition to the standard tests there

Pipeline for inference "You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset"

Hey, there are a few things: First: - I cannot really reproduce your example since your data is missing, meaning I'm not able to see exactly what's going on for...

Pipeline for inference "You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset"

Note: > for result in classifier(KeyDataset(samples, 'text'), labels, hypothesis_template = template, multi_label = False, batch_size = 32): This is the line of code I'm concerned about. It's perfectly ok if...

Pipeline for inference "You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset"

The warning is generated after simply 10 different calls of the pipeline on GPU (since with streaming there's only 1 call): https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/base.py#L1069 I'll look into this more thoroughly tomorrow.

Pipeline for inference "You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset"

Ok, I had to rework your example so that I could understand what was going on.: Ultimately I see similar results: ``` Batching 124it [00:24, 5.07it/s] No Batching 124it [00:32,...

fix: Text splitting in the BasicTokenizer

Looking at the output for `ar` it seems NEW + normalize is the best match isn't it ? I think this proves that `NFC` is indeed a good addition which...