tfx icon indicating copy to clipboard operation
tfx copied to clipboard

Add reduction server support to tfx-oss-public/tfx

Open feelingsonice opened this issue 3 years ago • 3 comments

From what I can gauge, the gcr.io/tfx-oss-public/tfx:1.7.1 container as of 1.7.1 does not have google-reduction-server installed. I tried installing the package manually according to the documentations:

echo "deb https://packages.cloud.google.com/apt google-fast-socket main" | tee /etc/apt/sources.list.d/google-fast-socket.list && \
    curl -s -L https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
    apt update && apt install -y google-reduction-server

But got the following error:

Unpacking google-reduction-server (2.2.0) ...
dpkg: error processing archive /var/cache/apt/archives/google-reduction-server_2.2.0_amd64.deb (--unpack):
 trying to overwrite '/usr/lib/libnccl-net.so', which is also in package google-fast-socket 0.0.5
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Errors were encountered while processing:
 /var/cache/apt/archives/google-reduction-server_2.2.0_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

Should reduction servers be supported?

feelingsonice avatar Apr 07 '22 19:04 feelingsonice

Thank you for the report. This seems like an issue of google-reduction-server.

Your error message shows that the package collide with google-fast-socket and it seems like some guide recommends uninstalling them first. (For example, https://cloud.google.com/blog/topics/developers-practitioners/optimize-training-performance-reduction-server-vertex-ai ) What kind of document are you referring?

jiyongjung0 avatar Apr 12 '22 00:04 jiyongjung0

@jiyongjung0 yeah I'm able to get it installed but I think it'd be better to have this be supported natively by tfx

feelingsonice avatar Apr 13 '22 21:04 feelingsonice

Thank you for your suggestion. Basically TFX images are built on top of Deep learning container which brings TF to the image. So I think that this should be included in that image instead of TFX layer. Let me check the plan on this and get back to you.

jiyongjung0 avatar Apr 14 '22 00:04 jiyongjung0