Llama 3
Would be very usefull to add open source models like Llama3
If you're using Ollama it's already supported through the custom_url approach. If you want I can post a quick how to later.
Would be Nice to see the tutorial! Thanks !
Sorry for the delay. If you're running Ollama locally and have pulled some models you can use Scikit-LLM to interact with the localhost.
Load the packages
from skllm.datasets import get_classification_dataset
from skllm.models.gpt.classification.few_shot import FewShotGPTClassifier
from skllm.config import SKLLMConfig
Set the url to your Ollama server. By default localhost on port 11434. v1 is the OpenAI compatible endpoint.
SKLLMConfig.set_gpt_url("http://localhost:11434/v1/")
Load data, create a classifier, fit and test it.
X, y = get_classification_dataset()
clf = FewShotGPTClassifier(model="custom_url::llama3", key="ollama")
clf.fit(X,y)
labels = clf.predict(X, num_workers=2) # num_workers are the number of parallel requests sent
Notes
-
keyandorgare technically not needed but expected byScikit-LLM, simply pass ollama or a random string. You can always ommitorg. -
num_workersis supported by default for Ollama as well, however you need to configure the server accordingly:
export OLLAMA_MAX_LOADED_MODELS=2 # sets the max number of loaded models
export OLLAMA_NUM_PARALLEL=2 # sets the max number of parallel tasks
- If you downloaded model with ollama like so
ollama pull llama3:8b, make sure to also use that name when creating the classifier:
clf = FewShotGPTClassifier(model="custom_url::llama3:8b", key="ollama")
- This approach works for
FewShotGPTClassifier,ZeroShotGPTClassifier, theirMultiLabelcounterparts, and should work forGPTSummarizer,GPTTranslator,GPTExplainableNER(I have not tested these). - It should work work for
DynamicFewShotGPTClassifierbecause of a recent fix byOllamathat now supports embeddings in the v1 endpoint, see here. Previously you had to set the above config to the api endpoint, which then clashed with the actual classification.
Additional info
- The v1 endpoint does not support passing additional information to the server, such as context size and temperature. This may be a problem, since e.g. the context size is by default 2048. The Ollama team is actively working on a fix though.
- Because
DynamicFewShotGPTClassifierhad no native support until recently and the missing options I adaptedScikit-LLMto work natively withOllamaand published it as a packge that depends onScikit-LLM. You can find it here or on PyPI. Sorry for the self-advertising.
Thank you for the great explaination !
It would be nice to add the functionnality to simply load any open source LLM model directly by just setting the path directory where the model have been downloaded or by using an huggingface link without any key.
Le ven. 26 juill. 2024 06:20, AndreasKarasenko @.***> a écrit :
Sorry for the delay. If you're running Ollama locally and have pulled some models you can use Scikit-LLM to interact with the localhost.
Load the packages
from skllm.datasets import get_classification_datasetfrom skllm.models.gpt.classification.few_shot import FewShotGPTClassifierfrom skllm.config import SKLLMConfig
Set the url to your Ollama server. By default localhost on port 11434. v1 is the OpenAI compatible endpoint.
SKLLMConfig.set_gpt_url("http://localhost:11434/v1/")
Load data, create a classifier, fit and test it.
X, y = get_classification_dataset()clf = FewShotGPTClassifier(model="custom_url::llama3", key="ollama")clf.fit(X,y)labels = clf.predict(X, num_workers=2) # num_workers are the number of parallel requests sent
Notes
- key and org are technically not needed but expected by Scikit-LLM, simply pass ollama or a random string. You can always ommit org.
- num_workers is supported by default for Ollama as well, however you need to configure the server accordingly:
export OLLAMA_MAX_LOADED_MODELS=2 # sets the max number of loaded modelsexport OLLAMA_NUM_PARALLEL=2 # sets the max number of parallel tasks
- This approach works for FewShotGPTClassifier, ZeroShotGPTClassifier, their MultiLabel counterparts, and should work for GPTSummarizer, GPTTranslator, GPTExplainableNER (I have not tested these).
- It should work work for DynamicFewShotGPTClassifier because of a recent fix by Ollama that now supports embeddings in the v1 endpoint, see here https://github.com/ollama/ollama/issues/2416. Previously you had to set the above config to the api endpoint, which then clashed with the actual classification.
Additional info
- The v1 endpoint does not support passing additional information to the server, such as context size and temperature. This may be a problem, since e.g. the context size is by default 2048. The Ollama team is actively working on a fix though.
- Because DynamicFewShotGPTClassifier had no native support until recently and the missing options I adapted Scikit-LLM to work natively with Ollama and published it as a packge that depends on Scikit-LLM. You can find it here https://github.com/AndreasKarasenko/scikit-ollama or on PyPI https://pypi.org/project/scikit-ollama/. Sorry for the self-advertising.
— Reply to this email directly, view it on GitHub https://github.com/iryna-kondr/scikit-llm/issues/109#issuecomment-2252447224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL2DSSJOJZ7D7J2WFT4ZDXLZOIPFRAVCNFSM6AAAAABLO45TBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJSGQ2DOMRSGQ . You are receiving this because you authored the thread.Message ID: @.***>
The maintainers of Scikit-LLM plan to offer native llama-cpp support, which will include loading models (similar to the current gpt4all implementation). You can also check out the discussion on their Discord.
In Ollama's case managing models is quite easy. E.g. ollama pull llama3 pulls the default llama3 instance and makes it available to the server. You don't need to specify paths, keys or anything with that. Or if you do ollama pull llama2 you can use llama2 instead with clf = FewShotGPTClassifier(model="custom_url::llama2", key="literally_anything").
edit: typo
Hi @CoteDave,
As @AndreasKarasenko already outlined, there are already multiple ways to use scikit-llm with local models either by running and OpenAI compatible web-server or using gpt4all backend that automatically handles model downloads.
However, scikit-llm is not compatible with the latest gpt4all versions and this backend will be replaced with llama_cpp in the following days. But the overall concept is going to be the same: a user provides the model name and it is downloaded automatically if not present.
We might investigate other options in the future, but overall would prefer to keep the model management outside of scikit-llm as much as possible.