unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Kernel check ignores LTS versions

Open alansill opened this issue 1 year ago • 4 comments

We have a number of our cluster users trying out unsloth. Because we run LTS kernel versions, the kernel version check built into unsloth produces confusion. The kernel LTS versions are numerically much lower than the recommended levels, but as you know, are back-ported to maintain functionality similar to newer kernel versions. Almost no HPC clusters will be running with kernel versions as high as the minimum that unsloth checks for. (For details, see https://access.redhat.com/support/policy/updates/errata and related listes for other distro releases.) I suggest that the unsloth kernel checks be refactored with this consideration in mind. Unlike personal and hobbyist machines, large clusters almost never run on the frequently updated schedules of the unstable branch.

alansill avatar Sep 29 '24 15:09 alansill

@alansill Hey Alan - sorry on the delay! Oh do you mean Unsloth's python dependencies should be pinned to a version to reduce dependency issues? Or maybe I'm mistaken?

danielhanchen avatar Oct 01 '24 05:10 danielhanchen

No, I mean that when run on a Linux system with an LTS kernel, the code throws a message warning that it might hang on kernels with versions less than 5.5. But the kernel for Enterprise Linux for example is always numerically at a much lower version number as it is generally intended for use with only necessary patches for years, unlike the ones used in distributions such as Fedora or Ubuntu.

alansill avatar Oct 03 '24 03:10 alansill

Ohh ok ok so the actual Linux kernel version - hmmm - unfortunately Unsloth relies on Pytorch and newer CUDA versions - that might be the culprit.

Unsloth does support Torch 2.1 and CUDA 11.8, so these might be more stable for older kernel versions

danielhanchen avatar Oct 03 '24 08:10 danielhanchen

Thanks. The point is that the kernel message is spurious. It only applies to unstable-branch kernels. Most HPC clusters run on the stable branch, in which kernels stay at numerically lower versions even though they have security and other needed patches for long-term continuous use for years. Is there any particular reason for the kernel version check to exist at all?

alansill avatar Oct 05 '24 15:10 alansill