marker
marker copied to clipboard
Creating a Docker image or Dockerfile from this repo.
It would be awesome to build a docker image for this repo.
I've got one working here: https://hub.docker.com/r/dibz15/marker_docker
~~It works, but right now it re-downloads the necessary resources on each run. If someone figures out how to get those to cache, that'd be great!~~ Nevermind, got the HF models cached in the image now!
@Dibz15 would it be possible to share the Dockerfile for building it locally. It seems the Convert multiple file script "convert.py" doesn't work, probably because of a missing dependency.
@agarwalshashank95 You can build off of their image, e.g.
FROM dibz15/marker_docker:latest
RUN pip install ray
RUN pip uninstall -y torch torchvision torchaudio
RUN pip3 install torch torchvision
COPY local.env /usr/src/app/marker/marker/local.env
RUN mkdir /.cache && chmod -R 777 /.cache
with local.env
in the same directory as
TORCH_DEVICE="cuda"
and
USER_ID=$(id -u)
GROUP_ID=$(id -g)
docker run --shm-size=10.24gb --gpus all -v "$PDF_DIR_SANITIZED":/pdfs --user $USER_ID:$GROUP_ID marker:latest python convert.py /pdfs/ /pdfs/
That said, it be great if there were a repo managed Dockerfile that we could all reference ...
I started a repo here that uses @Dibz15 's docker image to generate markdown
@robinsonkwame Thanks a ton! Didn't realize I could have used the existing Docker itself and built on top of that. This would work perfectly for my use case. But yes I agree, there should be an official docker that we can all refer to.
Hey, sorry I lost track of this. I didn't plan to run mine on a system with CUDA supported, so I didn't even think about that, sorry. Looks like it's been taken care of, though.
Here's the repo that I hosted the Dockerfile. I forgot to set it public.
how do I add fast api to this app?