unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

install on databricks

Open Jackie0601zhou opened this issue 1 year ago • 3 comments

How can I install unsloth on databricks notebook? I tried "pip install "unsloth[cu121-ampere-torch220] @ git+https://github.com/unslothai/unsloth.git" and I met: Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-rst_tpkk/unsloth_e8849fa753954ad5b20ad0a81efbd0be Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-rst_tpkk/unsloth_e8849fa753954ad5b20ad0a81efbd0be fatal: unable to access 'https://github.com/unslothai/unsloth.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated. error: subprocess-exited-with-error

Jackie0601zhou avatar May 16 '24 13:05 Jackie0601zhou

That seems like internet access does not work maybe?

danielhanchen avatar May 17 '24 18:05 danielhanchen

I used the HTTPS url and choose github as gitprovider when I added a repo to databricks. I also installed specific version of various packages. All the steps before trainer_stats = trainer.train() were good. But when I run trainer_stats = trainer.train(), it show: 屏幕截图 2024-05-19 230637 屏幕截图 2024-05-19 230654

Jackie0601zhou avatar May 19 '24 21:05 Jackie0601zhou

Currently getting the same issue as Jackie. I'm in a more regulated environment for databricks so I have to first download the repo and install through volumes. I have a suspicion that it's a dependency conflict but not too sure where to start looking.

jvhuang1786 avatar May 22 '24 05:05 jvhuang1786

I'm also getting the same issue. I've tried installing different versions of the packages, but I end up with the same error.

image image

KwesiD avatar Jun 03 '24 13:06 KwesiD

Hmmm wait is databricks using MLFlow?

danielhanchen avatar Jun 04 '24 15:06 danielhanchen

Yes. By default Databricks logs the runs with MLFlow.

KwesiD avatar Jun 05 '24 16:06 KwesiD

Hmmm ok - oh also is Databricks multi GPU?

danielhanchen avatar Jun 06 '24 16:06 danielhanchen

In my instance, I'm only using a single GPU. It's possible to set up a multi GPU cluster, though.

KwesiD avatar Jun 09 '24 14:06 KwesiD

Hmmm tbh I haven't tried Databricks so I can't exactly debug it - I'll see what I can do, but can't promise anything sorrty

danielhanchen avatar Jun 11 '24 13:06 danielhanchen

Wondering if any progress made on this? We are facing same issue, trying to install from source and everything is okay till you hit trainer.train() and it fails with segmentation fault.

julianmukaj avatar Aug 23 '24 09:08 julianmukaj

Oh no a segfault?? :(

danielhanchen avatar Aug 23 '24 23:08 danielhanchen

Think I have it tracked down to tensorboard, which means its most likely a databricks runtime fix not a unsloth one..

Fatal Python error: Segmentation fault

Thread 0x00007fe7c01f2640 (most recent call first):
  File "/usr/lib/python3.11/threading.py", line 324 in wait
  File "/usr/lib/python3.11/queue.py", line 180 in get
  File "/databricks/python/lib/python3.11/site-packages/tensorboard/summary/writer/event_file_writer.py", line 269 in _run
  File "/databricks/python/lib/python3.11/site-packages/tensorboard/summary/writer/event_file_writer.py", line 244 in run
  File "/usr/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/usr/lib/python3.11/threading.py", line 995 in _bootstrap

Tried turning off mlflow, installer older tensorboard.. no luck so far so leaving here if anyone else wants to debug.. (Check the cluster driver logs for more info)

julianmukaj avatar Aug 29 '24 16:08 julianmukaj

Update: The Segmentation Fault has been raised internally with Databricks, they have it down as a Feature Request. No ETA yet but hopefully those of us on regulated environments will be able to use Unsloth soon.

julianmukaj avatar Sep 10 '24 12:09 julianmukaj

@julianmukaj Thanks for the update!!

danielhanchen avatar Sep 14 '24 08:09 danielhanchen