Michele Dolfi comments

Results 54 comments of


                                            Michele Dolfi

trafficstars

Is it possible for convert() to return generators?

At the moment `convert()` returns generators on the number of documents in the input argument, but not within those.

[Feature Request] Add ByteDance/Dolphin model for Docling

On the [HF page](https://huggingface.co/ByteDance/Dolphin?library=transformers) I found this ```py # Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("ByteDance/Dolphin") model = AutoModelForImageTextToText.from_pretrained("ByteDance/Dolphin") ``` `AutoModelForImageTextToText` is not yet in the...

[Feature Request] Add ByteDance/Dolphin model for Docling

You can try to add `AutoModelForImageTextToText`. Enum definition: https://github.com/docling-project/docling/blob/0432a31b2f7c9fe944c3a1d4b608ef938b4f2299/docling/datamodel/pipeline_options_vlm_model.py#L26-L29 Usage: https://github.com/docling-project/docling/blob/0432a31b2f7c9fe944c3a1d4b608ef938b4f2299/docling/models/vlm_models_inline/hf_transformers_model.py#L83-L93 And in case you have to use a different prompt, you can use another `if/else` in https://github.com/docling-project/docling/blob/0432a31b2f7c9fe944c3a1d4b608ef938b4f2299/docling/models/vlm_models_inline/hf_transformers_model.py#L163

Downloading detection and recognition models takes a lot of time and space on my pod

This PR should simplify all of it: https://github.com/DS4SD/docling/pull/876

feat: [Experimental] New VLM Pipeline leveraging vision models

I'm summarizing here the target of this PR, I will submit code proposals later. ## `VlmPipeline` Specs of the new pipeline - Input: (PDF) Document - Processing: using a vision...

feat: adding new vlm-models support

| source | model_id | framework | num_pages | time | |------------------------------------| ----------------------------------------- | ------------------------------------------ | ----------- | ---------- | |tests/data/pdf/2305.03393v1-pg9.pdf | ds4sd_SmolDocling-256M-preview | InferenceFramework.TRANSFORMERS_VISION2SEQ | 1 | 102.212 |...

Michele Dolfi

Is it possible for convert() to return generators?

[Feature Request] Add ByteDance/Dolphin model for Docling

[Feature Request] Add ByteDance/Dolphin model for Docling

Downloading detection and recognition models takes a lot of time and space on my pod

feat: [Experimental] New VLM Pipeline leveraging vision models

feat: adding new vlm-models support

Table representation misaligned between PDF and DOCX

Support pagination in MSWord documents

Picture Description in Output

Picture Description in Output