docling
docling copied to clipboard
fix: vlm using artifacts path
Fix the usage of the artifacts path for the vlm models.
This PR might supersede #1051.
Checklist:
- [ ] Documentation has been updated, if necessary.
- [ ] Examples have been added, if necessary.
- [ ] Tests have been added, if necessary.
Merge Protections
Your pull request matches the following merge protections and will not be merged until they are valid.
🟢 Enforce conventional commit
Wonderful, this rule succeeded.
Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
- [X]
title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:
I dont think this will take the local file as an input kindly view my code in #1051 you will get to know in which code I have changed and why I reached there
I dont think this will take the local file as an input kindly view my code in #1051 you will get to know in which code I have changed and why I reached there
This is leveraged by the pipeline options, basically a snippet like
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import granite_picture_description
pipeline_options = PdfPipelineOptions()
pipeline_options.do_picture_description = True
pipeline_options.picture_description_options = PictureDescriptionVlmOptions(
repo_id="", # <-- add here the Hugging Face repo_id of your favorite VLM
prompt="Describe the image in three sentences. Be consise and accurate.",
)
pipeline_options.images_scale = 2.0
pipeline_options.generate_picture_images = True
# use local artifacts
pipeline_options.artifacts_path = "~/.cache/docling/models" # or other locations where models are located
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_options=pipeline_options,
)
}
)
doc = converter.convert(DOC_SOURCE).document
The same can also be activated via ENV DOCLING_ARTIFACTS_PATH=~/.cache/docling/models
But tell me one thing what if I use repo id for a local folder, where I have stored and downloaded an LLM what will it do next? Currently I am away from my PC, but if you can download one small vlm and download it in local path and then give its place in repo id then what.
The feature I have added because some of the security systems doesn't allow to store in cache, so we have to store it in a local path. So what next?
See if this artifact can solve it. Also my line of code was way too simple to create new things.