ScreenAI runtime error when executing the default example

Describe the bug after pip install screenai a runtime error is produced in the from screenai.main import ScreenAI line in the default example : RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x4 and 512x512)

To Reproduce Steps to reproduce the behavior:

run pip install screenai
run the default example

Expected behavior run without error

Screenshots `--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_20976\3292023021.py in <cell line: 2>() 1 import torch ----> 2 from screenai.main import ScreenAI 3 4 # Create a tensor for the image 5 image = torch.rand(1, 3, 224, 224)

~\AppData\Local\Programs\Python\Python39\lib\site-packages\screenai_init_.py in ----> 1 from screenai.main import ( 2 CrossAttention, 3 MultiModalEncoder, 4 MultiModalDecoder, 5 ScreenAI,

~\AppData\Local\Programs\Python\Python39\lib\site-packages\screenai\main.py in 5 from torch import Tensor, einsum, nn 6 from torch.autograd import Function ----> 7 from zeta.nn import ( 8 SwiGLU, 9 FeedForward,

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta_init_.py in 26 logger.addFilter(f) 27 ---> 28 from zeta.nn import * 29 from zeta.models import * 30 from zeta.utils import *

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn_init_.py in 1 from zeta.nn.attention import * 2 from zeta.nn.embeddings import * ----> 3 from zeta.nn.modules import * 4 from zeta.nn.biases import *

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn\modules_init_.py in 45 from zeta.nn.modules.s4 import s4d_kernel 46 from zeta.nn.modules.h3 import H3Layer ---> 47 from zeta.nn.modules.mlp_mixer import MLPMixer 48 from zeta.nn.modules.leaky_relu import LeakyRELU 49 from zeta.nn.modules.adaptive_layernorm import AdaptiveLayerNorm

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn\modules\mlp_mixer.py in 143 1, 512, 32, 32 144 ) # Batch size of 1, 512 channels, 32x32 image --> 145 output = mlp_mixer(example_input) 146 print( 147 output.shape

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _wrapped_call_impl(self, *args, **kwargs) 1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(*args, **kwargs) 1519 1520 def _call_impl(self, *args, **kwargs):

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *args, **kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(*args, **kwargs) 1528 1529 try:

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn\modules\mlp_mixer.py in forward(self, x) 123 x = rearrange(x, "n c h w -> n (h w) c") 124 for mixer_block in self.mixer_blocks: --> 125 x = mixer_block(x) 126 x = self.pred_head_layernorm(x) 127 x = x.mean(dim=1)

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _wrapped_call_impl(self, *args, **kwargs) 1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(*args, **kwargs) 1519 1520 def _call_impl(self, *args, **kwargs):

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *args, **kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(*args, **kwargs) 1528 1529 try:

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn\modules\mlp_mixer.py in forward(self, x) 61 y = self.norm1(x) 62 y = rearrange(y, "n c t -> n t c") ---> 63 y = self.tokens_mlp(y) 64 y = rearrange(y, "n t c -> n c t") 65 x = x + y

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _wrapped_call_impl(self, *args, **kwargs) 1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(*args, **kwargs) 1519 1520 def _call_impl(self, *args, **kwargs):

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *args, **kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(*args, **kwargs) 1528 1529 try:

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn\modules\mlp_mixer.py in forward(self, x) 28 torch.Tensor: description 29 """ ---> 30 y = self.dense1(x) 31 y = F.gelu(y) 32 return self.dense2(y)

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _wrapped_call_impl(self, *args, **kwargs) 1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(*args, **kwargs) 1519 1520 def _call_impl(self, *args, **kwargs):

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *args, **kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(*args, **kwargs) 1528 1529 try:

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\linear.py in forward(self, input) 112 113 def forward(self, input: Tensor) -> Tensor: --> 114 return F.linear(input, self.weight, self.bias) 115 116 def extra_repr(self) -> str:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x4 and 512x512)`

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

Feb 19 '24 10:02 NuiMrme

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

Feb 19 '24 10:02 github-actions[bot]

Got that same issue on a Mac M1

Mar 05 '24 10:03 DevChrisRoth

Same here Intel Mac

Mar 05 '24 16:03 emarashliev

Same with mac M2

Mar 09 '24 10:03 carlitose

Same with windows11&wsl

Mar 20 '24 10:03 zhaixiaowai

same here with windows10

Edit: I solved this issue by pip uninstall zetascale and reinstall with pip install zetascale In my case it installed an ancient version 0.9.xyz and after I installed the newest version 2.2.7 it worked

@kyegomez maybe it would be good to update the README example with the actual example from the example.py after solving this issue I got more issue because

there was no num_tokens defined
there was no max_seq_len defined
image and text were not initialized with the right dimensions

Another question I've got is, how did you choose num_tokens and max_seq_len?

Apr 03 '24 21:04 Yingrjimsch

Stale issue message

Jun 03 '24 12:06 github-actions[bot]

Same with Linux Ubuntu 24 LTS

Jun 08 '24 00:06 MElmardi

After my modifications, I got a working code, and I hope it solves your problem.

import torch
from screenai.main import ScreenAI

# 创建图像张量
image = torch.rand(1, 3, 224, 224)

# 创建 ScreenAI 模型的实例
model = ScreenAI(
    num_tokens=2000,
    max_seq_len=1024,
    patch_size=16,
    image_size=224,
    dim=512,
    depth=6,
    heads=8,
    vit_depth=4,
    multi_modal_encoder_depth=4,
    llm_decoder_depth=4,
    mm_encoder_ff_mult=4,
)

# 假设您的文本已经被转换为词索引，这里我们使用随机整数来模拟
# num_tokens 是您的词汇表大小，max_seq_len 是模型能够处理的最大序列长度
text_indices = torch.randint(0, model.num_tokens, (1, model.max_seq_len))

# 将文本索引张量转换为长整型张量
text = text_indices.long()

# 使用给定的文本和图像张量进行模型的正向传播
out = model(text, image)

# 打印输出张量的形状
print(out)

Jul 03 '24 07:07 RokiRan

Stale issue message

Sep 07 '24 12:09 github-actions[bot]

same here with windows10

Edit: I solved this issue by pip uninstall zetascale and reinstall with pip install zetascale In my case it installed an ancient version 0.9.xyz and after I installed the newest version 2.2.7 it worked

@kyegomez maybe it would be good to update the README example with the actual example from the example.py after solving this issue I got more issue because

there was no num_tokens defined

there was no max_seq_len defined

image and text were not initialized with the right dimensions

Another question I've got is, how did you choose num_tokens and max_seq_len?

The max seq Len and the number of tokens is the max vocabulary of the tokenizer and the max seq length is typically 4096 8192 16092, etc 32042

And

Sep 08 '24 22:09 kyegomez

Stale issue message

Nov 09 '24 12:11 github-actions[bot]