feat(docker): adding native GHCR container
Originally, if the user want to use Docker on this project, this are the steps he was gonna do :
- Clone the Repo
- Build the Docker image using
docker build -t markitdown:latest . - Run the command
docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md
With this PR, i am hopping to simplify this procedure so that testing (and trying this repo out, for example to check if it correspond to our need) can be simplified to :
- Run the command
docker run --rm -i ghcr.io/microsoft/markitdown:latest < ~/your-file.pdf > output.md
@microsoft-github-policy-service agree
Forgot to say, there's a need to create a project variable called REGISTRY that contain, for example, "ghcr.io" (or "docker.io" but then there'll be a need to retouch my commits)
This looks very promising. Thank you.
I need to check some procedural stuff on my end before I can merge this (re: hosting images rather than just Dockerfiles). Let me spend a day or two on this, and I'll get back to you.
Hey @afourney, As i just said in #1186 :
it was the first workflow i made, i henceforth don't have much knowledge about them
However, i would love to talk about it with @seuros.
From what i see, here's a table difference :
| #1184 | #1186 |
|---|---|
| We create a container on tag creation | We create a container when pushing on branch main ; from a PR or not |
| We do not set the permissions of the PAT ; it is implicitly set | We set the permissions of the PAT to the minimum |
Use the 6th version of docker/build-push-action |
Use the 5th version of docker/build-push-action |
Additionally, @seuros added a docker/setup-qemu-action@v3 action that i do not know the use of, and a docker/setup-buildx-action@v3 action that allow to create, if i understand well, an image for the architecture "linux/amd64" and "linux/arm64"
From what i see, i believe we should implement the following :
- #1184 : We create a container on tag creation
- This way, we'll have access to
ghcr.io/microsoft/markitdown:v1.0.0,ghcr.io/microsoft/markitdown:v2.0.0, ... Instead of justghcr.io/microsoft/markitdown:main
- This way, we'll have access to
- #1186 : We set the permissions of the PAT to the minimum
- Better security practice to set the permissions in hard instead of implicitly letting Github to set them
- #1184 : Use the 6th version of
docker/build-push-action- I don't see why v5 should be used instead of v6 ?
- #1186 : Use
docker/setup-buildx-action@v3- Did not know about this, but it's better to create an image for different CPU architecture
For the Qemu action, i don't really know until i've talked with @seuros.