mamba issues

A MAC computation problem about mamba

Hello, researchers, I'm very interested in your work and use mamba to build some model, and I want to compute the MACs of the model with torchstat, then this error...

lth456321

Speech generation using Mamba

I am interested in “How to use mamba to generate audio”. One of amazing things is the long sequence attention, i wanna know whether mamba can be used in TTS,...

coding-sharks

Mamba-block without convolutional layer

2

Hello, thanks for the nice work. Can we use the Mamba block without this convolutional layer? I would appreciate any hint or suggestion you can give me. Thank You ![image](https://github.com/state-spaces/mamba/assets/56618776/0c060555-01d5-4c4a-badd-2ddbebca12a0)

kabbas570

Using input if size torch.Size([200, 100])

1

My input is a batch of 200 and values of 100. (torch.Size([200, 100])) So my feature is just a 1-D vector. In the example code, it says you set batch,...

alqurri77

Variable input sequence length

11

Dear authors, This is an amazing work! I'm working with variable sequence lengths of video data. In one batch, there could be several videos with different frame numbers, and they...

ruoyxue

Figure 4 -- Mamba vs. Transformer 1.3B and 6.9B Mamba

3

@tridao , @albertfgu 1) Could you please post exact settings that created 1.3B Transformer that is used in Figure 4? 2) Could share what settings you used (hiddendim,..etc) to create...

llmexperiment

How can I avoid using causal-conv1d?

11

I want to use nn.Conv1d instead of causal-conv1d. How should I modify the code?

CacatuaAlan

Some questions about the shape of A,B,C,D

8

Dear Authors, Thanks for your brilliant works! Now I am learning about the detailed change of parameter shape in your code and in your paper. I noticed that the A...

Aristo23333

Vanishing gradient problem with more layer

2

Dear author, I stacked multiple mamba layers to form a model, and trained the model from scatch. When I just stacked 4 layers, the perfomance was very good. So I...

yjdy

Is it okay to put S6(MAMBA) and gated MLP blocks like a transformer? (Also please open the discussion tab)

I am trying to write a simple mamba model by studying the first mamba paper just for testing on a small dataset in JAX. Some tests I came across online...

qnixsynapse

mamba
mamba copied to clipboard

Metadata

A MAC computation problem about mamba

Speech generation using Mamba

Mamba-block without convolutional layer

Using input if size torch.Size([200, 100])

Variable input sequence length

Figure 4 -- Mamba vs. Transformer 1.3B and 6.9B Mamba

How can I avoid using causal-conv1d?

Some questions about the shape of A,B,C,D

Vanishing gradient problem with more layer

Is it okay to put S6(MAMBA) and gated MLP blocks like a transformer? (Also please open the discussion tab)

← Metadata

Owner

Metadata

mamba mamba copied to clipboard

Metadata

← Metadata

Owner

Metadata

mamba
mamba copied to clipboard