docs-scraper
docs-scraper copied to clipboard
Publish a new Docker image containing Chrome binary
In order to solve the issue https://github.com/meilisearch/docs-scraper/issues/139 for the Docker users, we will need to publish a new version of the base Dockerfile containing the Chrome binary.
To accomplish this issue we need to:
- [ ] Create a new Dockerfile with the Chrome binary additions
- [ ] Configure Github Actions to release a new image version
getmeili/docs-scraper-with-chrome - [ ] Update README sections regarding the usage of this new image, which will be required only by the users who need the chrome binary.
After this addition, we will be able to instruct users to use this new image when they need it, and we will not impact the current users of the getmeili/docs-scraper image with a non-requested size addition.
hint: we could base this new image in the algolia's image https://hub.docker.com/layers/algolia/docsearch-scraper/latest/ or in this comment https://github.com/meilisearch/docs-scraper/issues/139#issuecomment-872752458
@alallema, after #235 is merged, now we have an increase in the size of our Docker image:
This is not ideal, but if we were straight to what this issue says, the job is not done yet 😅.
Two choices:
- Reopen this issue and just close it when we have a new docker image done.
- Let this be as it is and wait for future users to ask for improvements.
What are your thoughts about that?
@brunoocasali, I think it's better to keep this one open no?
@alallema @brunoocasali Seems like greater image size has been caused by the installation of chromium-driver. By taking into account dependencies of chromium-driver it's possible to figure out why it happened - at least chromium-browser being a dependency has installed size from 100MB-200MB depending on architecture.
At any rate it's gonna be great to find out the way to decrease the overall size of image. BTW seems like algolia/docsearch-scraper has the same purpose as well as greater image size... Things are not that bad 😀
@brunoocasali @alallema I spent a little time on building different Dockerfiles in order to compare their final uncompressed size. I have three dockerfiles:
- with_chromium.Dockerfile -- actual Dockerfile where chromium-driver & chromium are installed from debian repository.
- with_chrome.Dockerfile -- changed Dockerfile where chrome-driver & chrome are installed from official repository.
- without_pipenv.Dockerfile -- changed Dockerfile where chromium-driver & chromium are installed from debian repository, but pipenv is removed from image.
After building images that's a picture I got:
admin@docker-lab:~/meilisearch-demo/docs-scraper-fork$ docker image ls | grep test/
test/scraper-without-pipenv latest 88568ec55e5c 6 seconds ago 1.73GB
test/scraper-with-chrome latest 44c5bb15c055 9 minutes ago 1.95GB
test/scraper-with-chromium latest 8bbeb6236cb9 About an hour ago 1.84GB
So, to come to conclusions:
- switching to official chrome has no benefits from the image size perspective. But it's worth to check if chromium is still maintained & updated.
- there is a way to decrease image size by removing pipenv from the final image as I believe there is no need to have pipenv in docker image. In order to install required packages we're gonna convert
Pipfiletorequirements.txtand install it by using pip.
What do you think about that?
Hello @mdraevich! Thanks for working on this. Your thoughts are really helpful!
My initial thought about reducing the image size was focused on splitting into two different images:
getmeili/docs-scraperjust what is required to run the scrappinggetmeili/docs-scraper-with-chromeeverything from the previous image + the entire chrome/chromium binary/
This brings extra complexity to managing because we will have to publish two images. Still, in my, POV is the best alternative since it helps people who don't need the chrome binary + people who need it without hurting the disk size from the first group.
That said, I'm not sure if it is worth just removing the pipenv from the image.
What do you think @alallema?
Thank you so much @mdraevich for your research! @brunoocasali I agree with you, I'm not sure it worth it for now. Maybe we could close this issue in meantime don't you think ?
Yeah, let's do it then. If somebody asks for a size compression, we can reopen this one.