OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

[Bug]: RuntimeError: GET was unable to find an engine to execute this computation

Open amrakm opened this issue 9 months ago • 8 comments

What happened?

I had OneTrainer working fine before on the same machine

I just pulled latest changes here and updated dependencies with update.sh

After update I started getting this error: RuntimeError: GET was unable to find an engine to execute this computation

I also tried starting from scratch by clone the repo again and starting a fresh install and got the same error

config json here (using standard preset for SDXL ) config.json

What did you expect would happen?

starts training

Relevant log output

Traceback (most recent call last):
  File "/home/username/dataSSD/repos/OneTrainer/modules/ui/TrainUI.py", line 523, in __training_thread_function
    trainer.train()
  File "/home/username/dataSSD/repos/OneTrainer/modules/trainer/GenericTrainer.py", line 469, in train
    self.data_loader.get_data_set().start_next_epoch()
  File "/home/username/dataSSD/repos/OneTrainer/src/mgds/src/mgds/MGDS.py", line 50, in start_next_epoch
    self.loading_pipeline.start_next_epoch()
  File "/home/username/dataSSD/repos/OneTrainer/src/mgds/src/mgds/LoadingPipeline.py", line 86, in start_next_epoch
    module.start(self.__current_epoch)
  File "/home/username/dataSSD/repos/OneTrainer/src/mgds/src/mgds/pipelineModules/DiskCache.py", line 231, in start
    self.__refresh_cache(out_variation)
  File "/home/username/dataSSD/repos/OneTrainer/src/mgds/src/mgds/pipelineModules/DiskCache.py", line 204, in __refresh_cache
    f.result()
  File "/home/username/anaconda3/envs/ot/lib/python3.10/concurrent/futures/_base.py", line 438, in result
    return self.__get_result()
  File "/home/username/anaconda3/envs/ot/lib/python3.10/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/home/username/anaconda3/envs/ot/lib/python3.10/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/username/dataSSD/repos/OneTrainer/src/mgds/src/mgds/pipelineModules/DiskCache.py", line 192, in fn
    split_item[name] = self._get_previous_item(in_variation, name, in_index)
  File "/home/username/dataSSD/repos/OneTrainer/src/mgds/src/mgds/PipelineModule.py", line 87, in _get_previous_item
    item = module.get_item(variation, index, item_name)
  File "/home/username/dataSSD/repos/OneTrainer/src/mgds/src/mgds/pipelineModules/EncodeVAE.py", line 46, in get_item
    latent_distribution = self.vae.encode(image.unsqueeze(0)).latent_dist
  File "/home/username/dataSSD/repos/OneTrainer/src/diffusers/src/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/home/username/dataSSD/repos/OneTrainer/src/diffusers/src/diffusers/models/autoencoders/autoencoder_kl.py", line 260, in encode
    h = self.encoder(x)
  File "/home/username/anaconda3/envs/ot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/username/anaconda3/envs/ot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/username/dataSSD/repos/OneTrainer/src/diffusers/src/diffusers/models/autoencoders/vae.py", line 143, in forward
    sample = self.conv_in(sample)
  File "/home/username/anaconda3/envs/ot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/username/anaconda3/envs/ot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/username/anaconda3/envs/ot/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/username/anaconda3/envs/ot/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: GET was unable to find an engine to execute this computation

Output of pip freeze

absl-py==2.1.0 accelerate==0.25.0 aiohttp==3.9.5 aiosignal==1.3.1 antlr4-python3-runtime==4.9.3 async-timeout==4.0.3 attrs==23.2.0 bitsandbytes==0.43.0 cachetools==5.3.3 certifi==2024.2.2 charset-normalizer==3.3.2 cloudpickle==3.0.0 coloredlogs==15.0.1 customtkinter==5.2.1 dadaptation==3.2 darkdetect==0.8.0 -e git+https://github.com/huggingface/diffusers.git@5d848ec07c2011d600ce5e5c1aa02a03152aea9b#egg=diffusers filelock==3.14.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.3.1 ftfy==6.2.0 google-auth==2.29.0 google-auth-oauthlib==1.2.0 grpcio==1.62.2 huggingface-hub==0.20.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 invisible-watermark==0.2.0 Jinja2==3.1.3 lightning-utilities==0.11.2 lion-pytorch==0.1.2 Markdown==3.6 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 -e git+https://github.com/Nerogar/mgds.git@1dc300967e75b6fa0fb4b72587f3df08a8278efd#egg=mgds mpmath==1.3.0 multidict==6.0.5 networkx==3.3 numpy==1.26.2 nvidia-cublas-cu11==11.11.3.6 nvidia-cuda-cupti-cu11==11.8.87 nvidia-cuda-nvrtc-cu11==11.8.89 nvidia-cuda-runtime-cu11==11.8.89 nvidia-cudnn-cu11==8.7.0.84 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.3.0.86 nvidia-cusolver-cu11==11.4.1.48 nvidia-cusparse-cu11==11.7.5.86 nvidia-nccl-cu11==2.19.3 nvidia-nvtx-cu11==11.8.86 oauthlib==3.2.2 omegaconf==2.3.0 onnxruntime-gpu==1.16.3 open-clip-torch==2.23.0 opencv-python==4.8.1.78 packaging==24.0 pillow==10.2.0 platformdirs==4.2.1 pooch==1.8.0 prodigyopt==1.0 protobuf==4.23.4 psutil==5.9.8 pyasn1==0.6.0 pyasn1_modules==0.4.0 Pygments==2.17.2 pynvml==11.5.0 pytorch-lightning==2.1.3 PyWavelets==1.6.0 PyYAML==6.0.1 regex==2024.4.28 requests==2.31.0 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 safetensors==0.4.1 scalene==1.5.39 scipy==1.11.4 sentencepiece==0.2.0 six==1.16.0 sympy==1.12 tensorboard==2.15.1 tensorboard-data-server==0.7.2 timm==0.9.16 tokenizers==0.15.2 torch==2.2.0+cu118 torchmetrics==1.3.2 torchvision==0.17.0+cu118 tqdm==4.66.1 transformers==4.36.2 triton==2.2.0 typing_extensions==4.11.0 urllib3==2.2.1 wcwidth==0.2.13 Werkzeug==3.0.2 xformers==0.0.24+cu118 yarl==1.9.4 zipp==3.18.1

amrakm avatar Apr 30 '24 13:04 amrakm