tfx
tfx copied to clipboard
Add reduction server support to tfx-oss-public/tfx
From what I can gauge, the gcr.io/tfx-oss-public/tfx:1.7.1 container as of 1.7.1 does not have google-reduction-server installed. I tried installing the package manually according to the documentations:
echo "deb https://packages.cloud.google.com/apt google-fast-socket main" | tee /etc/apt/sources.list.d/google-fast-socket.list && \
curl -s -L https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
apt update && apt install -y google-reduction-server
But got the following error:
Unpacking google-reduction-server (2.2.0) ...
dpkg: error processing archive /var/cache/apt/archives/google-reduction-server_2.2.0_amd64.deb (--unpack):
trying to overwrite '/usr/lib/libnccl-net.so', which is also in package google-fast-socket 0.0.5
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Errors were encountered while processing:
/var/cache/apt/archives/google-reduction-server_2.2.0_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
Should reduction servers be supported?
Thank you for the report. This seems like an issue of google-reduction-server.
Your error message shows that the package collide with google-fast-socket and it seems like some guide recommends uninstalling them first. (For example, https://cloud.google.com/blog/topics/developers-practitioners/optimize-training-performance-reduction-server-vertex-ai ) What kind of document are you referring?
@jiyongjung0 yeah I'm able to get it installed but I think it'd be better to have this be supported natively by tfx
Thank you for your suggestion. Basically TFX images are built on top of Deep learning container which brings TF to the image. So I think that this should be included in that image instead of TFX layer. Let me check the plan on this and get back to you.