markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

feat(docker): adding native GHCR container

Open goulvenb opened this issue 8 months ago • 4 comments

Originally, if the user want to use Docker on this project, this are the steps he was gonna do :

  • Clone the Repo
  • Build the Docker image using docker build -t markitdown:latest .
  • Run the command docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md

With this PR, i am hopping to simplify this procedure so that testing (and trying this repo out, for example to check if it correspond to our need) can be simplified to :

  • Run the command docker run --rm -i ghcr.io/microsoft/markitdown:latest < ~/your-file.pdf > output.md

goulvenb avatar Apr 12 '25 13:04 goulvenb

@microsoft-github-policy-service agree

goulvenb avatar Apr 12 '25 13:04 goulvenb

Forgot to say, there's a need to create a project variable called REGISTRY that contain, for example, "ghcr.io" (or "docker.io" but then there'll be a need to retouch my commits)

goulvenb avatar Apr 12 '25 13:04 goulvenb

This looks very promising. Thank you.

I need to check some procedural stuff on my end before I can merge this (re: hosting images rather than just Dockerfiles). Let me spend a day or two on this, and I'll get back to you.

afourney avatar Apr 13 '25 16:04 afourney

Hey @afourney, As i just said in #1186 :

it was the first workflow i made, i henceforth don't have much knowledge about them

However, i would love to talk about it with @seuros.

From what i see, here's a table difference :

#1184 #1186
We create a container on tag creation We create a container when pushing on branch main ; from a PR or not
We do not set the permissions of the PAT ; it is implicitly set We set the permissions of the PAT to the minimum
Use the 6th version of docker/build-push-action Use the 5th version of docker/build-push-action

Additionally, @seuros added a docker/setup-qemu-action@v3 action that i do not know the use of, and a docker/setup-buildx-action@v3 action that allow to create, if i understand well, an image for the architecture "linux/amd64" and "linux/arm64"

From what i see, i believe we should implement the following :

  • #1184 : We create a container on tag creation
    • This way, we'll have access to ghcr.io/microsoft/markitdown:v1.0.0, ghcr.io/microsoft/markitdown:v2.0.0, ... Instead of just ghcr.io/microsoft/markitdown:main
  • #1186 : We set the permissions of the PAT to the minimum
    • Better security practice to set the permissions in hard instead of implicitly letting Github to set them
  • #1184 : Use the 6th version of docker/build-push-action
    • I don't see why v5 should be used instead of v6 ?
  • #1186 : Use docker/setup-buildx-action@v3
    • Did not know about this, but it's better to create an image for different CPU architecture

For the Qemu action, i don't really know until i've talked with @seuros.

goulvenb avatar Apr 13 '25 17:04 goulvenb