quantize not working
mkdir models
jlama download google/gemma-3-1b-it --auth-token=xxxx
jlama quantize google/gemma-3-1b-it
Output:
java.nio.file.FileAlreadyExistsException: /home/ubuntu/models/google_gemma-3-1b-it-JQ4/config.json at java.base/sun.nio.fs.UnixFileSystem.copy(UnixFileSystem.java:983) at java.base/sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:280) at java.base/java.nio.file.Files.copy(Files.java:1189) at com.github.tjake.jlama.safetensors.SafeTensorSupport.quantizeModel(SafeTensorSupport.java:276) at com.github.tjake.jlama.cli.commands.QuantizeCommand.run(QuantizeCommand.java:68) at picocli.CommandLine.executeUserObject(CommandLine.java:2026) at picocli.CommandLine.access$1500(CommandLine.java:148) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461) at picocli.CommandLine$RunLast.handle(CommandLine.java:2453) at picocli.CommandLine$RunLast.handle(CommandLine.java:2415) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2264) at picocli.CommandLine.parseWithHandlers(CommandLine.java:2664) at picocli.CommandLine.parseWithHandler(CommandLine.java:2599) at com.github.tjake.jlama.cli.JlamaCli.main(JlamaCli.java:58) at jlama.main(jlama.java:22)
In general I tried quantize with many models, even though the quantize step worked, the chat step with the quantized model never worked. Is there any working documentation of quantize with open models?
Hello,
So based on the stack trace you posted, it looks like you've already tried to quantize the model. If you go into the models directory you should see something like google/gemma-3-1b-it-JQ4. Try deleting that directory and running the quantize command again.
The other thing I'll mention is that you're trying to run/quantize a Gemma 3 model. Currently that model isn't supported in Jlama. There is a feature request #152 to support that model. Try quantizing a supported Jlama model type and see if that works.
Last thing I'll mention is Jake has a hugging face namespace that has some models that have already been quantized available to download. It's at https://huggingface.co/tjake
Hope this helps!
I do not see anything like google/gemma-3-1b-it-JQ4 inside models directory. I see google_gemma-3-1b-it-JQ4 inside the models directory.
can you please provide an example of a model I can customize at this moment using jlama which is not already present in the tjake huggingface repository?
Sorry, you are correct. The directory would be google_gemma-3-1b-it-JQ4.
The following model types are supported:
llama gemma gemma2 mistral granite mistral gpt2 bert qwen2
You can see the supported model types here: https://github.com/tjake/Jlama/blob/main/jlama-core/src/main/java/com/github/tjake/jlama/model/ModelSupport.java
Since qwen2 is a suggested model. I tried below steps and get the error. Can you please share a similar working example with steps.
Java 21
-
jlama download Qwen/Qwen2-0.5B --auth-token=xxxx
NOTE: Picked up JDK_JAVA_OPTIONS: --add-modules jdk.incubator.vector --enable-preview
WARNING: Using incubator modules: jdk.incubator.vector
NOTE: Picked up JDK_JAVA_OPTIONS: --add-modules jdk.incubator.vector --enable-preview
WARNING: Using incubator modules: jdk.incubator.vector
README.md 100% [===================================] 4/4KB (0:00:01 / 0:00:00)
config.json 100% [==============================] 661/661B (0:00:01 / 0:00:00)
model.safetensors 100% [=======================] 988/988MB (0:00:04 / 0:00:00)
tokenizer.json 100% [==============================] 7/7MB (0:00:01 / 0:00:00)
tokenizer_config.json 0% [ ] 0/1KB (0:00:00 / ?)
-
ls models/Qwen_Qwen2-0.5B
Qwen_Qwen2-0.5B
-
jlama quantize Qwen/Qwen2-0.5B
NOTE: Picked up JDK_JAVA_OPTIONS: --add-modules jdk.incubator.vector --enable-preview
WARNING: Using incubator modules: jdk.incubator.vector
Skipping quantization of layer: model.norm.weight
Skipping quantization of layer: model.layers.5.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.9.input_layernorm.weight
Skipping quantization of layer: model.layers.0.input_layernorm.weight
Skipping quantization of layer: model.layers.15.input_layernorm.weight
Skipping quantization of layer: model.layers.22.input_layernorm.weight
Skipping quantization of layer: model.layers.6.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.3.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.7.input_layernorm.weight
Skipping quantization of layer: model.layers.10.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.17.input_layernorm.weight
Skipping quantization of layer: model.layers.4.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.20.input_layernorm.weight
Skipping quantization of layer: model.layers.18.input_layernorm.weight
Skipping quantization of layer: model.layers.8.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.11.input_layernorm.weight
Skipping quantization of layer: model.layers.9.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.11.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.4.input_layernorm.weight
Skipping quantization of layer: model.layers.13.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.13.input_layernorm.weight
Skipping quantization of layer: model.layers.2.input_layernorm.weight
Skipping quantization of layer: model.layers.7.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.12.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.1.input_layernorm.weight
Skipping quantization of layer: model.layers.15.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.14.input_layernorm.weight
Skipping quantization of layer: model.layers.23.input_layernorm.weight
Skipping quantization of layer: model.layers.8.input_layernorm.weight
Skipping quantization of layer: model.layers.14.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.16.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.16.input_layernorm.weight
Skipping quantization of layer: model.layers.21.input_layernorm.weight
Skipping quantization of layer: model.layers.20.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.21.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.6.input_layernorm.weight
Skipping quantization of layer: model.layers.2.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.18.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.5.input_layernorm.weight
Skipping quantization of layer: model.layers.19.input_layernorm.weight
Skipping quantization of layer: model.layers.10.input_layernorm.weight
Skipping quantization of layer: model.layers.22.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.17.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.0.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.23.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.3.input_layernorm.weight
Skipping quantization of layer: model.layers.12.input_layernorm.weight
Skipping quantization of layer: model.layers.1.post_attention_layernorm.weight
Skipping quantization of layer: model.layers.19.post_attention_layernorm.weight
Quantized model written to: /home/ubuntu/models/Qwen_Qwen2-0.5B-JQ4
-
jlama chat Qwen_Qwen2-0.5B-JQ4
Above throws error:
NOTE: Picked up JDK_JAVA_OPTIONS: --add-modules jdk.incubator.vector --enable-preview
WARNING: Using incubator modules: jdk.incubator.vector
NOTE: Picked up JDK_JAVA_OPTIONS: --add-modules jdk.incubator.vector --enable-preview
WARNING: Using incubator modules: jdk.incubator.vector
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.ChatCommand@971d0d8): java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
at picocli.CommandLine.executeUserObject(CommandLine.java:2035)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2264)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:2664)
at picocli.CommandLine.parseWithHandler(CommandLine.java:2599)
at com.github.tjake.jlama.cli.JlamaCli.main(JlamaCli.java:58)
at jlama.main(jlama.java:22)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
at com.github.tjake.jlama.cli.commands.SimpleBaseCommand.getName(SimpleBaseCommand.java:76)
at com.github.tjake.jlama.cli.commands.SimpleBaseCommand.getModel(SimpleBaseCommand.java:134)
at com.github.tjake.jlama.cli.commands.SimpleBaseCommand.getModel(SimpleBaseCommand.java:122)
at com.github.tjake.jlama.cli.commands.ChatCommand.run(ChatCommand.java:46)
at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
... 9 more
You should use jlama chat Qwen/Qwen2-0.5B-JQ4
Thanks this works!