ctakes-docker icon indicating copy to clipboard operation
ctakes-docker copied to clipboard

ctakes containers are still too big

Open tmills opened this issue 7 years ago • 5 comments

@MatthewVita asked in his last pull request:

One question for you (I can do this in a separate PR): should we just commit in the cTAKES zip artefacts (can COPY them in via Dockerfile)? The download site takes forever to pull them down. I realize this may not be a best practice, but...

I would like to do something about this, but not crazy about adding even more jars (I'd like to remove the jars currently checked in at some point). It might be possible to just pick the individual jars we need with wget from apache servers? Still (maybe) slow servers but avoiding the dependency parser alone would cut 250Mb from the download size.

tmills avatar Jun 28 '17 19:06 tmills

Or mavenize everything and let maven figure out which jars to grab? IDK if it's standard to include maven in containers, it's certainly has a heavy enough footprint on its own.

tmills avatar Jun 28 '17 19:06 tmills

I've used Maven in a containerized setting. Sounds like a great idea because Maven central servers are fast. However, it may not help at all with the container size problem. Hmm.

MatthewVita avatar Jun 28 '17 21:06 MatthewVita

Looked into maven a bit, it can help us with the jars but probably not with the uima and ctakes downloads. Since it downloads the entire internet to compile one java class, I doubt it's faster or smaller than the way it's set up now.

tmills avatar Jul 05 '17 13:07 tmills

Agreed

MatthewVita avatar Jul 05 '17 23:07 MatthewVita

Taking a step back, I don't think the container size is actually the issue here. It's more the download times. For instance, the Apache servers that we download from for the pipeline image take forever. Perhaps we can just "pull the pain forward" and commit the files into the repo and COPY them?

MatthewVita avatar Jul 05 '17 23:07 MatthewVita