gemma_pytorch icon indicating copy to clipboard operation
gemma_pytorch copied to clipboard

The official PyTorch implementation of Google's Gemma models

Results 50 gemma_pytorch issues
Sort by recently updated
recently updated
newest added

Trying to fix #51 And this also increase the speed of loading weights. (in my computer, about 1min vs 2min) Tested on `1.1-7b-it` and `7b-it` model. but: 1. This method...

With model.generate() it takes too long even sequence generation have done earlier with EOS token. Because now, it generates til it reached to output_len fix the generate method to stop...

Correct the variable name of `norm_type`

In the recent commit, I have noticed an inconsistency in the configuration of the `query_pre_attn_scalar` parameter between the 9B and 27B models in this repository. Specifically: In the 9B model,...

bug
stat:awaiting response

[The official example](https://ai.google.dev/gemma/docs/core/pytorch_gemma?hl=ja) code uses the 4b-it variant with get_model_config, but this results in a ValueError. It appears that "4b-it" is not listed as a supported variant inside get_model_config, even...

I was trying to understand the code and found it a little confusing because the `i` wasn't used in that for loop, but there was another below where it was...

I get the following error since setup.py sets `python_requires=">=3.11"` while the base image pytorch/pytorch:2.1.2-cuda11.8-cudnn8-runtime uses Python: 3.10.13: `ERROR: Package 'gemma' requires a different Python: 3.10.13 not in '>=3.11'`

`huggingface-cli download google/gemma-3-4b-it-pytorch` gives: ``` Traceback (most recent call last): File "/usr/lib/python3.12/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status response.raise_for_status() File "/usr/lib/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error:...

Hello, I have been studying the Gemma-Pytorch implementation and noticed a potential discrepancy in the code related to the application of rotary embeddings in the attention mechanism. Specifically, the rotary...

Kaggle's [Gemma 3 page](https://www.kaggle.com/models/google/gemma-3/pyTorch) does not have a pytorch checkpoint for the 270M models, but it does have hugging face checkpoints for those models. Would it be possible to upload...