diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Getting "signal: floating point exception" error on inference call

Open geekyayush opened this issue 2 years ago • 1 comments

I am trying to do an inference call on A6000 GPU but getting the following error:

Creating scheduler
Error: signal: floating point exception (core dumped)

I get the error only with A100 or A6000. I don't get this error with Quadro RTX 5000. Although, I get CUDA out of memory with Quadro RTX 5000.

It's throwing that error when I try to create the scheduler like this

lms = LMSDiscreteScheduler(
        beta_start=0.00085,
        beta_end=0.012,
        beta_schedule="scaled_linear",
        num_train_timesteps=1000,
    )

Here's my Dockerfile

FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
RUN mkdir /app
WORKDIR /app

COPY requirements.txt .
COPY inference.py .

RUN apt update && apt upgrade -y && \
    pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir -r requirements.txt
RUN apt update && apt install -y git && pip install --upgrade git+https://github.com/huggingface/diffusers.git

CMD ["/usr/bin/python3", "/app/inference.py"]

requirements.txt

transformers>=4.21.0
scipy==1.7.0
ftfy==6.1.1
pyheif
piexif
bitsandbytes
boto3
accelerate==0.12.0

Can anyone please help me with this? I would really appreciate.

Thanks!

geekyayush avatar Dec 13 '22 13:12 geekyayush

Hey @geekyayush,

I cannot reproduce the error. Also could you try to use the newest version of accelerate? So change:

accelerate==0.12.0

to:

accelerate

?

patrickvonplaten avatar Dec 19 '22 11:12 patrickvonplaten

Thanks @patrickvonplaten Solved the issue.

geekyayush avatar Dec 21 '22 13:12 geekyayush