LaTeXML icon indicating copy to clipboard operation
LaTeXML copied to clipboard

Publishing Docker image to Docker Hub

Open kwkbtr opened this issue 5 years ago • 20 comments

Hi, do you have any plan for publishing a Docker image built with release/Dockerfile to Docker Hub? There may be a problem mentioned in https://github.com/brucemiller/LaTeXML/pull/1008#issuecomment-399694452

Sadly DockerHub (for automated builds that @bfirsh suggested above) does not seem to support build arguments, so it is non-trivial to automate builds for both variants of the image.

but having at least an image with default build arguments (ARG WITH_TEXLIVE="yes") would be a great help for us.

kwkbtr avatar Aug 01 '19 04:08 kwkbtr

Hi @kwkbtr . Currently we have @tkw1536 managing latexml's dockerization, so he could provide more context / brainstorm useful new images.

Have you seen the ones already created by Tom at: https://hub.docker.com/r/latexml/latexml-test-runtime/

which we use for Travis?

dginev avatar Aug 01 '19 04:08 dginev

Thank you for your suggestion. I noticed those images but did not look into them closely since I was not sure they are suitable for usual use cases, not for testing. I will give them a try.

kwkbtr avatar Aug 01 '19 05:08 kwkbtr

I had a look at latexml/latexml-test-runtime and noticed that tags of the images do not include version number of LaTeXML. It would be great if a specific LaTeXML version can be specified via image tag.

kwkbtr avatar Aug 01 '19 05:08 kwkbtr

I have considered publishing images to dockerhub, however using dockerhub auto-builds is difficult, because the Dockerfile is in the release subfolder whereas the build context needed is in the repository root. This is also seen in the Dockerfile itself:

# This Dockerfile expects the root directory of LaTeXML as a build context. 
# To achieve this run the following command from the root directory:
#
# > docker build -t latexml -f release/Dockerfile .

I can imagine three solutions to this:

  • We push images manually, which might result in a lot of work
  • We have Travis CI build the images and automatically push them to Dockerhub, but that would require some work to set up properly
  • We move the Dockerfile to the root of the repository (I vaguely remember this was the plan at some point, but there were some objections by @brucemiller)

tkw1536 avatar Aug 01 '19 11:08 tkw1536

On the high level, I think we need a general approach similar to having a mini-team of maintainers that manage the Debian and Fedora package for latexml. I think Bruce only manages the macports route.

Having an up-to-date and functional collection of docker images strikes me as a similar maintenance burden. We would likely need a volunteer to at least prepare images for the named releases.

Also linking to the current hits for latexml on dockerhub, maybe we could recruit one of their authors as a volunteer, e.g. @physikerwelt ?

https://hub.docker.com/search?q=latexml&type=image

dginev avatar Aug 01 '19 16:08 dginev

I'll happily volunteer as maintainer of the DockerHub images, if we can figure out:

  • what images we want (with / without TexLive)
  • when do we want them updated (Daily? Weekly? Monthly? On Release?)

tkw1536 avatar Aug 02 '19 09:08 tkw1536

Awesome, thanks Tom!

Personally I see a point for having release-based docker images (e.g. we can make one for each of 0.8.2, 0.8.3, 0.8.4 and then continue at each release point), as well as a single image that tracks master -- which is the bit that would have to be done automatically through Travis. That setup should take care of all reasonable use cases. Curious to hear if that would work for @kwkbtr as well?

dginev avatar Aug 02 '19 12:08 dginev

:+1: That's what we do for engrafo. Git tags turn into image tags for releases, and latest tracks master. We also push sha hash images for every commit, for the hell of it.

It's built on Travis so we can speed up builds by pulling and using --cache-from. That might be unnecessary for LaTeXML, so building on Docker Hub would work fine if you don't care about build speed.

https://github.com/arxiv-vanity/engrafo/blob/master/.travis.yml https://github.com/arxiv-vanity/engrafo/blob/master/script/ci-deploy-master https://github.com/arxiv-vanity/engrafo/blob/master/script/ci-deploy-tag

bfirsh avatar Aug 02 '19 14:08 bfirsh

That setup should take care of all reasonable use cases. Curious to hear if that would work for @kwkbtr as well?

Yes, that should work great for my current use case. Thank you all for your consideration! 👍

kwkbtr avatar Aug 02 '19 15:08 kwkbtr

Thanks @bfirsh , that's quite helpful!

dginev avatar Aug 02 '19 15:08 dginev

I've made a PR that adds support for DockerHub auto builds: #1181

tkw1536 avatar Aug 04 '19 12:08 tkw1536

In the absence of official resolution for maintaining a docker image (I think it is not on anyone's critical path?), I ended up sidelining this issue and creating a new Dockerfile for a multi-threaded harness project that converts large collections of mathematical formulas -- which is a typical use of latexml for the Math Information Retrieval community (e.g. ARQMath is using latexml in 2020-2021).

"sidelining" in the sense that I couldn't do a

FROM latexml:latest

to base my image on. So instead I based it on the latest rust image (the proglang for the harness), and did the entire latexml installation dance through apt and cpanminus. Linking the Dockerfile here for reference, note that this is still experimental: https://github.com/dginev/latexml-runner/blob/main/Dockerfile

Would be nice to circle back and tidy up the Docker toolchain pieces... so, bump ?

dginev avatar Feb 04 '21 00:02 dginev

From my end the Dockerfile in this repository still works. The only thing outstanding is that it should be published on some registry (e.g. DockerHub, GitHub Package Registry).

tkw1536 avatar Feb 04 '21 10:02 tkw1536

I'm bumping the milestone again, since it's hard for us to get into the right mindset to organize and actively maintain these. It's a bit of a paradox that while everyone wants to have an official and properly updating "dockerized latexml" available, no one has the right motivation to actually execute on that.

The latexml dockerhub namespace lacks people who actively use a dockerized vanilla latexml, so it's almost like we're squatting on that namespace handle. Tom has been great in updating the CI images regularly, but he doesn't do actual latexml-at-scale conversions, so it's a different focus. Meanwhile, I do latexml-at-scale conversions, but with my own home-baked docker image that does a lot more than a vanilla latexml image would. So the whole thing is a bit sideways... We ought to straighten it out.

dginev avatar Aug 09 '21 22:08 dginev

While/Since we still don't have a resolution on how to maintain an official latexml image, I have published another unofficial one today, again installing latexml from scratch (in one of the many possible ways, this time using cpanminus, following the LaTeXML-Plugin-Cortex Dockerfile).

It is available under latexml/ar5ivist on Dockerhub, and the respective repository here. As the name suggests, it is a turnkey one-liner for conversions using the exact configuration for ar5iv.

dginev avatar Aug 02 '22 17:08 dginev

I think we should use this to restart the discussion of having an official docker image or not.

tkw1536 avatar Aug 05 '22 14:08 tkw1536

@dginev FYI, I'm experimenting with an automatically built and publicly available OCI (docker) image with latexml over on gitlab. I'm planning to put it over at https://gitlab.com/perm.pub/dock. Feel free to shoot me questions and requests for better documentation, how I build it and why, etc... Enjoy!

castedo avatar Mar 20 '24 14:03 castedo

@castedo thank you for the heads up!

You are most welcome to edit your comment above and describe the full details of your use case, both in executing latexml, and in the way you've decided to package and publish that setup. I think it can be informative for everyone tracking this issue to know of such recent developments.

dginev avatar Mar 20 '24 14:03 dginev

Here's the dual Git & OCI container image registry which currently has LaTeXML 0.8.8, with some documentation on how to run it: https://gitlab.com/perm.pub/dock/latexml-deb

For more details and documentation on how container image gets automatically built and deployed checkout: https://gitlab.com/perm.pub/dock/

I'm using it to investigate what kind of JATS XML gets output by latexml+latexmlpost.

castedo avatar Mar 21 '24 01:03 castedo

I have been using this docker file in production for several years. https://hub.docker.com/r/physikerwelt/latexml/tags It is a bit memory-hungry unless you restrict it.

physikerwelt avatar Mar 21 '24 08:03 physikerwelt