gemma_pytorch issues

Memory saving loading weight for non-quant models

5

Trying to fix #51 And this also increase the speed of loading weights. (in my computer, about 1min vs 2min) Tested on `1.1-7b-it` and `7b-it` model. but: 1. This method...

KaneGreen

early stop when all sequence reach EOS

3

With model.generate() it takes too long even sequence generation have done earlier with EOS token. Because now, it generates til it reached to output_len fix the generate method to stop...

je1lee

Update xla_model_parallel.py

1

Correct the variable name of `norm_type`

ya0guang

Inconsistent 'query_pre_attn_scalar' Setting Between 9B and 27B Models

2

In the recent commit, I have noticed an inconsistency in the configuration of the `query_pre_attn_scalar` parameter between the 9B and 27B models in this repository. Specifically: In the 9B model,...

kiddj

bug

stat:awaiting response

get_model_config raises ValueError when using official variant 4b-it

3

[The official example](https://ai.google.dev/gemma/docs/core/pytorch_gemma?hl=ja) code uses the 4b-it variant with get_model_config, but this results in a ValueError. It appears that "4b-it" is not listed as a supported variant inside get_model_config, even...

h-suzuki-isp

Rename unused for loop variable `_` instead of `i`

3

I was trying to understand the code and found it a little confusing because the `i` wasn't used in that for loop, but there was another below where it was...

paruby

Docker image comes with outdated Python version for Gemma-3 support

I get the following error since setup.py sets `python_requires=">=3.11"` while the base image pytorch/pytorch:2.1.2-cuda11.8-cudnn8-runtime uses Python: 3.10.13: `ERROR: Package 'gemma' requires a different Python: 3.10.13 not in '>=3.11'`

MalekWahidi

No pytorch versions of gemma 3 models on HF

3

`huggingface-cli download google/gemma-3-4b-it-pytorch` gives: ``` Traceback (most recent call last): File "/usr/lib/python3.12/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status response.raise_for_status() File "/usr/lib/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error:...

Ocean-Moist

Question about Rotary Embedding Sequence in Model Code vs. Diagrams

1

Hello, I have been studying the Gemma-Pytorch implementation and noticed a potential discrepancy in the code related to the application of rotary embeddings in the attention mechanism. Specifically, the rotary...

littlepsilon

Availability and support for Gemma3 270M models

Kaggle's [Gemma 3 page](https://www.kaggle.com/models/google/gemma-3/pyTorch) does not have a pytorch checkpoint for the 270M models, but it does have hugging face checkpoints for those models. Would it be possible to upload...

nsfinkelstein

gemma_pytorch
gemma_pytorch copied to clipboard

Metadata

Memory saving loading weight for non-quant models

early stop when all sequence reach EOS

Update xla_model_parallel.py

Inconsistent 'query_pre_attn_scalar' Setting Between 9B and 27B Models

get_model_config raises ValueError when using official variant 4b-it

Rename unused for loop variable `_` instead of `i`

Docker image comes with outdated Python version for Gemma-3 support

No pytorch versions of gemma 3 models on HF

Question about Rotary Embedding Sequence in Model Code vs. Diagrams

Availability and support for Gemma3 270M models

← Metadata

Owner

Metadata

gemma_pytorch gemma_pytorch copied to clipboard

Metadata

← Metadata

Owner

Metadata

gemma_pytorch
gemma_pytorch copied to clipboard