fairseq2 Unable to run fairseq2 inside nvidia/cuda docker container due to libfairseq2n.so.0: undefined symbol error

Unable to run fairseq2 inside nvidia/cuda docker container due to libfairseq2n.so.0: undefined symbol error

Open kyurkchyan opened this issue 4 months ago • 0 comments

Describe the bug:

Our Python application utilizes a seamless model to perform speech-to-text transformations. To host the application, we use an Azure virtual machine (Standard NC24ads A100 v4 (24 vcpus, 220 GiB memory)) running Ubuntut 22.04. We have docker installed on the virtual machine, and the application is deployed using docker.

We have the following docker file

# Start with a base image that includes CUDA
FROM nvidia/cuda:12.1.0-base-ubuntu20.04

ENV CACHE_ROOT_FOLDER=/cache/verse-extractor-cache

# Set the timezone to UTC
ENV TZ=UTC
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

# Install some necessary tools
RUN apt-get update && \
    apt-get install -y ffmpeg && \
    apt-get install -y python3 && \
    apt-get install -y python3-pip && \
    # Install Git
    apt-get install -y git && \
    # Clean up to reduce container size
    rm -rf /var/lib/apt/lists/*

# Set up a working directory
WORKDIR /app

# Copy requirements.txt and install Python packages using pip
COPY requirements.txt ./
RUN pip3 install -r requirements.txt

# Install fairseq2
RUN pip install fairseq2 --extra-index-url https://fair.pkg.atmeta.com/fairseq2/whl/pt2.2.0/cu121

# Copy the rest of your app
COPY . /app

# Set the command to run your app
CMD ["python3", "app.py"]

Our requirements.txt looks like this

openai-whisper
flask
sentence-transformers
pandas
yt-dlp
pip-system-certs
tiktoken
triton
torch
torchaudio
ffmpeg
pydub
openpyxl
sentencepiece
git+https://github.com/facebookresearch/seamless_communication.git

We use docker-compose to build and run the image

version: '3.8'

services:
  verse-extractor-api:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [ gpu ]

I have the following simple app.py file

from seamless_communication.inference import Translator
import logging

logging.basicConfig(level=logging.INFO)

if __name__ == '__main__':
    logging.info("Starting the app")

When I build the docker image using sudo docker-compose up --build -d command and then I view the logs sudo docker-compose logs I see the following error message

Traceback (most recent call last):
  File "app.py", line 1, in <module>
    from seamless_communication.inference import Translator
  File "/usr/local/lib/python3.8/dist-packages/seamless_communication/__init__.py", line 9, in <module>
    from fairseq2.assets import FileAssetMetadataProvider, asset_store
  File "/usr/local/lib/python3.8/dist-packages/fairseq2/assets/__init__.py", line 7, in <module>
    from fairseq2.assets.card import AssetCard as AssetCard
  File "/usr/local/lib/python3.8/dist-packages/fairseq2/assets/card.py", line 28, in <module>
    from fairseq2.data.typing import is_string_like
  File "/usr/local/lib/python3.8/dist-packages/fairseq2/data/__init__.py", line 7, in <module>
    from fairseq2.data.cstring import CString as CString
  File "/usr/local/lib/python3.8/dist-packages/fairseq2/data/cstring.py", line 61, in <module>
    from fairseq2n.bindings.data.string import CString as CString
ImportError: /usr/local/lib/python3.8/dist-packages/fairseq2n/lib/libfairseq2n.so.0: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

What am I doing wrong?

Describe how to reproduce:

Pull the app.py, Dockerfile, requirements.txt, docker-compose.yml files I've mentioned above
Copy them to a single directory
Make sure you have docker and docker compose installed
Run docker-compose up --build -d

Describe the expected behavior:

The application should be launched successfully without any errors

Environment: Ubuntu 22.04 fairseq2 - 0.2.0, PyTorch - 2.2.0 Python - 3.8 CUDA - 12.1 GPU - NVIDIA A100

Feb 08 '24 10:02 kyurkchyan

fairseq2 fairseq2 copied to clipboard

Unable to run fairseq2 inside nvidia/cuda docker container due to libfairseq2n.so.0: undefined symbol error

fairseq2
fairseq2 copied to clipboard