Add GenAI packages
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
The GenAI packages will provide torchsharp implementation for a series of popular GenAI models. The goal is to load the same weight from the corresponding python regular model.
- [x] Add design doc (#7170)
- [x] Add
Microsoft.ML.GenAI.Core(#7177)
The following models will be added in the first wave
- [x] Phi-3 (
Microsoft.ML.GenAI.Phi) #7184- [x] Add README to
Microsoft.ML.GenAI.Phiproject #7206
- [x] Add README to
- [x] LLaMA (
Microsoft.ML.GenAI.LLaMA) #7220 - [ ] Mistral (
Microsoft.ML.GenAI.Mistral)- [x] Mistral-7b-instruct v3
- [ ] Mistral-nemo
- [x] Generate Embedding from CausalLMModel #7227
- [ ] Stable Diffusion (
Microsoft.ML.GenAI.StableDiffusion)
MEAI intergration
- [ ] Add CausalLMPipelineChatClient #7270
Along with the benchmark
-
[ ] Benchmark for Phi-3
-
[ ] Flash Attention support #7238
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.
Can you guys publish a preview for Microsoft.ML.GenAI.LLaMA package?
@lostmsu You should be able to consume it from the daily build below
- https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-libraries/nuget/v3/index.json
Oh, just notice that the GenAI package hasn't been set to IsPackable to true so it's not available on daily build. Will publish a PR to enable the package flag
Can you please publish a preview for Microsoft.ML.GenAI.Core package? It is not available
https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-libraries/nuget/v3/index.json
The sample Microsoft.ML.GenAI.Samples/Llama/LLaMA3_1.cs is broken without it .
Furthermore, the sample has hard coded weight folder
var weightFolder = @"C:\Users\xiaoyuz\source\repos\Meta-Llama-3.1-8B-Instruct";
I have downloaded the model and config from Meta site. May be a few comments will be helpful.
Can you please publish a preview for Microsoft.ML.GenAI.Core package? It is not available
https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-libraries/nuget/v3/index.json
The sample Microsoft.ML.GenAI.Samples/Llama/LLaMA3_1.cs is broken without it .
Furthermore, the sample has hard coded weight folder var weightFolder = @"C:\Users\xiaoyuz\source\repos\Meta-Llama-3.1-8B-Instruct"; I have downloaded the model and config from Meta site. May be a few comments will be helpful.
Oh, sorry, I'll make the fix
I am getting System.IO.FileNotFoundException couldn't find model.safetensors.index.json calling at Microsoft.ML.GenAI.LLaMA.LlamaForCausalLM.FromPretrained(String modelFolder, String configName, String checkPointName, ScalarType torchDtype, String device) I can't get the example working, please explain where/what this file is?
@aforoughi1 Which llama, I suppose you are runnning llama 3.2 1B?
Llama3.1-8B
@aforoughi1
The error basically say it can't find the {ModelFolder}/model.safetensors.index.json, could you share the full code to call the model, stacktrace and a screenshot of the llama 3.1 8B model folder
// issue 7169 //Meta-Llama-3.1-8B-Instruct/orginial string weightFolder = @"C:\Users\abbas.llama\checkpoints\Llama3.1-8B"; string configName = "params.json"; string modelFile = "tokenizer.model";
TiktokenTokenizer tokenizer = LlamaTokenizerHelper.FromPretrained(weightFolder, modelFile); LlamaForCausalLM model = LlamaForCausalLM.FromPretrained(weightFolder, configName, layersOnTargetDevice: -1 ,targetDevice: "cpu"); Console.WriteLine("Loading Llama from model weight folder");
var pipeline = new CausalLMPipeline<TiktokenTokenizer, LlamaForCausalLM>(tokenizer, model, "cpu");
System.IO.FileNotFoundException
HResult=0x80070002
Message=Could not find file 'C:\Users\abbas.llama\checkpoints\Llama3.1-8B\model.safetensors.index.json'.
Source=System.Private.CoreLib
StackTrace:
at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options)
at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
at System.IO.Strategies.FileStreamHelpers.ChooseStrategyCore(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
at System.IO.Strategies.FileStreamHelpers.ChooseStrategy(FileStream fileStream, String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, Int64 preallocationSize)
at System.IO.StreamReader.ValidateArgsAndOpenPath(String path, Encoding encoding, Int32 bufferSize)
at System.IO.File.InternalReadAllText(String path, Encoding encoding)
at System.IO.File.ReadAllText(String path)
at TorchSharp.PyBridge.PyBridgeModuleExtensions.load_checkpoint(Module module, String path, String checkpointName, Boolean strict, IList1 skip, Dictionary2 loadedParameters, Boolean useTqdm)
at Microsoft.ML.GenAI.LLaMA.LlamaForCausalLM.FromPretrained(String modelFolder, String configName, String checkPointName, ScalarType torchDtype, String device)
at Test.GenAITest.LLaMATest1() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\Test\GenAITest.cs:line 35
at Test.Program.GenAI() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\Test\Program.cs:line 425
at Test.Program.Main(String[] args) in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\Test\Program.cs:line 54
@aforoughi1
LlamaForCausalLM loads .safetensor model weight while in your code, you are targeting the original .pth model weight folder.
The .safetensor model weight should be located in Meta-Llama-3.1-8B-Instruct, maybe update the weight folder to that path when loading LlamaForCausalLM?
LlamaForCausalLM model = LlamaForCausalLM.FromPretrained("Meta-Llama-3.1-8B-Instruct", configName, layersOnTargetDevice: -1 ,targetDevice: "cpu");
I sorted the following missing files and the directory structure: model.safetensors.index model-00004-of-00004.safetensors model-00001-of-00004.safetensors model-00002-of-00004.safetensors model-00003-of-00004.safetensors model-00004-of-00004.safetensors
The model is loaded successfully ONLY if I use the defaults layersOnTargetDevice: -1, quantizeToInt8: false quantizeToInt4 = false
Setting layersOnTargetDevice: 26, quantizeToInt8: true causes memory corruptions exception.
The example also missing stopWatch.Stop();
I also don't see RegisterPrintMessage(), print any messages to the console.
@aforoughi1 Are you using nightly build or trying the example from main branch
Nightly buildOn 7 Oct 2024, at 17:20, Xiaoyun Zhang @.***> wrote: @aforoughi1 Are you using nightly build or trying the example from main branch
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
@aforoughi1 And your GPU device/platform?
Device is set
torch.InitializeDeviceType(DeviceType.CPU);
microsoft.ml.genai.llama\0.22.0-preview.24477.3\
microsoft.ml.torchsharp\0.21.1\
torchsharp-cpu\0.103.0\
Processor 12th Gen Intel(R) Core(TM) i5-1235U 2.50 GHz
Installed RAM 16.0 GB (15.8 GB usable)
System type 64-bit operating system, x64-based processor
Edition Windows 11 Home
Version 23H2
OS build 22631.4249
Experience Windows Feature Experience Pack 1000.22700.1041.0
From: Xiaoyun Zhang @.> Sent: 07 October 2024 17:26 To: dotnet/machinelearning @.> Cc: Abbas Foroughi @.>; Mention @.> Subject: Re: [dotnet/machinelearning] Add GenAI packages (Issue #7169)
@aforoughi1 https://github.com/aforoughi1 And your GPU device/platform?
— Reply to this email directly, view it on GitHub https://github.com/dotnet/machinelearning/issues/7169#issuecomment-2397380734 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ALUPR57BV5YWNQBCJISZCBDZ2KYZTAVCNFSM6AAAAABI5KARSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJXGM4DANZTGQ . You are receiving this because you were mentioned.Message ID: @.***>
The layersOnTargetDevice is for GPU, so I haven't test values other than -1 in CPU scenario. For the quantizeToInt8 and quantizeToInt4, you probably also won't gain benefits on CPU scenarios. So maybe just keep it as false.