AlpinDale

Results 170 comments of AlpinDale

As previously discussed offline, I believe bad_words is an incomplete solution to a very real problem; banning only the last token in the provided sequence is not useful at all,...

That sounds useful, I will look into it soon. I can try and implement it on our side if possible, otherwise middleware should work fine for now.

I'm not sure I understand the issue. If you need the engine to limit the max_model_len to the amount your GPU can fit, then we already handle that, as you...

0.9.0 works, but 0.9.1 doesn't due to the new vectorized activation kernels being incompatible with ROCm. I will address this soon.

Perhaps, I'll have to look into it. bnb hasn't been a priority

FYI I'm working on new kernels for massively speeding up bnb quants + add TP support for them. You might want to hold on for now, or help out with...

Will probably need some restructuring after #925

I believe code compiled on top of CUDA 12 works across all versions with different minor revisions. But we can change that to flashinfer's 12.4 wheels, if they have any.

Thanks for doing this.

Please perform the instructions in the issue template and run the env.py script so I can see what environment you're working with. I have no idea what the default kaggle...