gemma_pytorch issues

Inconsistency between PyTorch and JAX implementation

Hello! In the PyTorch implementation, in the MLP, exact GeLU is used as a gating function. ![image](https://github.com/google/gemma_pytorch/assets/78988918/84e902b1-8de1-4030-af73-f191598aa875) ![image](https://github.com/google/gemma_pytorch/assets/78988918/463c0ed1-5c41-473c-92a8-282ecbce753b) In the JAX version, the approximate gelu is used. ![image](https://github.com/google/gemma_pytorch/assets/78988918/ccb2f80f-f340-4fbd-90a6-c636330c8103) ![image](https://github.com/google/gemma_pytorch/assets/78988918/d8e99898-acd3-47fb-83e8-b134715fc98a) Could...

aboros98

"--output_len" argument ignored

k-nar

not found weight file

3

Build the image according to the dockerfile file, then run the container. Error: IsADirectoryError: [Errno 21] Is a directory: '/tmp/ckpt', it should be that there is no weight file in...

Cguanqin

type:support

stat:awaiting response

After deplyed google/gemma-7b-it, there always is error response.

8

After deplyed google/gemma-7b-it, there always is error response when sending any message. Response: `Of course! Here are some creative ideas for a 10-year-old's birthday party:`

ydh10002023

bug

is it possible to convert gemma_pytorch to onnx to tflite?

2

is it possible to convert gemma_pytorch to onnx to tflite?

nyadla-sys

type:support

stat:awaiting response

[Question] Embeddings normalization by sqrt(hidden_size)

Hello there 👋 Thanks for the repo. But I have one question: why do we need to scale up (normalize) token embeddings? https://github.com/google/gemma_pytorch/blob/01062c9ef4cf89ac0c985b25a734164ede017d0b/gemma/model.py#L431-L432 Unfortunately, I cannot find an answer anywhere.

Andrei-Aksionov

Cannot run on v4-16 worker 0 TPU VM: "Failed to get global TPU topology"

5

``` bash markusheimerl@t1v-n-a16d1e4e-w-0:~/gimli$ cd ~/gemma_cktp/ && curl -o archive.tar.gz "https://storage.googleapis.com/kaggle-models-data/5305/11357/bundle/archive.tar.gz?X-Goog-Algorithm=GOOG4-RSA-SHA256..." && tar -xf archive.tar.gz && cd ~/gimli markusheimerl@t1v-n-a16d1e4e-w-0:~/gimli$ cd ../gemma_pytorch/ markusheimerl@t1v-n-a16d1e4e-w-0:~/gemma_pytorch$ VARIANT=2b markusheimerl@t1v-n-a16d1e4e-w-0:~/gemma_pytorch$ CKPT_PATH=/home/markusheimerl/gemma_ckpt/ markusheimerl@t1v-n-a16d1e4e-w-0:~/gemma_pytorch$ sudo usermod -aG docker $USER...

markusheimerl

type:support

How to fine-tune Gemma with pytorch?

How to fine-tune Gemma with pytorch? There seems to be fine-tuning code on Huggingface, but it cannot be used directly. Thanks

solitude-alive

Gemma fixes - gelu

1

Just a few more Gemma fixes :) Currently checking for more as well! Related PR: https://github.com/huggingface/transformers/pull/29285, which showed RoPE must be done in float32 and not float16, causing positional encodings...

danielhanchen

fix: raise Exception

2

To ensure that the exception is handled correctly, raise should be used instead of return

leowzz

gemma_pytorch
gemma_pytorch copied to clipboard

Metadata

Inconsistency between PyTorch and JAX implementation

"--output_len" argument ignored

not found weight file

After deplyed google/gemma-7b-it, there always is error response.

is it possible to convert gemma_pytorch to onnx to tflite?

[Question] Embeddings normalization by sqrt(hidden_size)

Cannot run on v4-16 worker 0 TPU VM: "Failed to get global TPU topology"

How to fine-tune Gemma with pytorch?

Gemma fixes - gelu

fix: raise Exception

← Metadata

Owner

Metadata

gemma_pytorch gemma_pytorch copied to clipboard

Metadata

← Metadata

Owner

Metadata

gemma_pytorch
gemma_pytorch copied to clipboard