metavoice-src
metavoice-src copied to clipboard
MPS Support
Hi, Congrats on the launch! Is MPS (Apple Silicon) or MLX support planned? Thank you!
Hey @fakerybakery, can you share your use-case in more detail? MPS / MLX support is not planned in the short-term. But, we'd love to collaborate if you would like to implement this for our model?
Great work on metavoice, sounds really good! Wish I could use it om my Mac.
Use case: all Apple Silicon (Mac, iPhone, iPad, Vision). There is a huge lack of good tts for Apple and there is demand. Take note of @fakerybakery, he's everywhere and doing good work.
Is it difficult to implement MPS support? The use case is basically the entire Apple ecosystem users. I was going to write a cross platform installer for the app (I work on https://pinokio.computer) and ended up here, but now the fact that there's no plan for supporting MPS i'm kind of confused. I believe the true hockey stick growth for your model will come not from a small number of entrepreneurs who run a server and charge for service, but every single user with a laptop.
I think a major issue is that this uses flash-attn, which is currently not supported on either MPS or ROCm.
yep @fakerybakery is right. But, a chunk of what's required to fix this is already implemented, it’s just not been hooked up properly. Will try to do it, but happy for someone to beat me to it!
Here’s the ref : https://github.com/metavoiceio/metavoice-src/blob/main/fam/llm/layers/attn.py
It contains code to mix and match the following options:
- flash decoding (which does kv-caching and attn calculations within a single optimised kernel, requires newer NVIDIA GPUs),
- fa2 + vanilla kv caching (which uses fa2 - an optimised attn kernel - for attn calculation but does kv caching using standard PyTorch ops. But due to fa2 it still requires newer NVIDIA GPUs),
- torch attn + vanilla kv caching (which does both attn calculation and kv caching using torch standard ops, so should work across the board)
So one needs to change some plumbing in the code iirc to use the third option and I believe it should work!
That sounds awesome, and really glad to see that you guys care. Looking forward to the progress!
Thanks for participating. Can't wait to review a PR here :)
+1 for this :) need this so bad on my Mac Ultra M2! Mimic3 TTS is just not great and OpenAI costs a bit much for 24/7 use.
I may try to see if I can figure it out with the "torch attn + vanilla kv caching" but not too hopeful of my luck with that :D Any details on what needs to be done may help me at least try, I just haven't done MPS / GPU support very much (hacked at getting bark.cpp working but was a kludge / not a full MPS dev experience).
Curious about where resources are to learn about that more, since I am really coding a lot and have this Mac I want to work with everything so might as well put some effort in on the MPS support like this if I can get the hang of it.
Hey folks, I pushed a change that should fix the problems mentioned here, could you give it another go?
Hey folks, I pushed a change that should fix the problems mentioned here, could you give it another go?
I tried to install requirements.txt but now getting ModuleNotFoundError: No module named 'torch'. I am on Mac apple silicon. Do I need to install torch seperatly?
Looking into this now :)
Looking into this now :)
idk if this could help, https://github.com/facebookresearch/xformers/issues/740#issue-1695177874
Looking into this now :)
idk if this could help, facebookresearch/xformers#740 (comment)
it seems like xFormer is not supported on Mac (not 100% sure), based on the following comment: https://github.com/facebookresearch/xformers/issues/740#issuecomment-1594080277
Edit: In the GitHub repo of xFormer it says 'RECOMMENDED Linux & Win'
I tried to make this yesterday with the recent changes pushed by @pyetras but we unfortunately still have some set of small issues beyond the ones outlined above... will keep this thread updated!
by the way, if you want to resolve the above installation errors, below works:
pip install torch torchvision torchaudio
pip install -r requirements.txt
pip install --upgrade torch torchvision torchaudio
by the way, if you want to resolve the above installation errors, below works:
pip install torch torchvision torchaudio pip install -r requirements.txt pip install --upgrade torch torchvision torchaudio
I tried it but still no luck. No module named 'torch' found.
I seem to have mine up to a point where flash-attn isn't available on my Mac and has worked beyond xformers. I had to use python3.11 for some reason, my issue outside of the docker (which has another issue 139 failure which may also indicate it is MPS) is here: https://github.com/metavoiceio/metavoice-src/issues/48 where it says NameError: name 'flash_attn_with_kvcache' is not defined. I followed the same commands as in the Dockerfile in my fork branch where I got it building in the docker and worked up to the point of it crashing but then on native mac it got further up to the kvcache flash attn issue.
This shows the part that prints out that it didn't load the flash_attn and hence flash_attn_with_kvcache is not available.
=> [metavoice-server internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 937B 0.0s
=> [metavoice-server internal] load metadata for docker.io/library/python:3.11-slim 1.2s
=> [metavoice-server internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [metavoice-server 1/7] FROM docker.io/library/python:3.11-slim@sha256:ce81dc539f0aedc9114cae640f8352fad83d37461c24a3615b01f081d0c0583a 0.1s
=> => resolve docker.io/library/python:3.11-slim@sha256:ce81dc539f0aedc9114cae640f8352fad83d37461c24a3615b01f081d0c0583a 0.0s
=> => sha256:ce81dc539f0aedc9114cae640f8352fad83d37461c24a3615b01f081d0c0583a 1.65kB / 1.65kB 0.0s
=> => sha256:238b008604c229d4897d1fa131f9aaecddec61a199edbdb22851622dd65dcebd 1.37kB / 1.37kB 0.0s
=> => sha256:cfa17c2baa64b89de4e828d2f7f219dc88a61599b076ef7ea08c653f6df56b74 6.95kB / 6.95kB 0.0s
=> [metavoice-server internal] load build context 0.2s
=> => transferring context: 29.01MB 0.2s
=> [metavoice-server 2/7] RUN apt-get update && apt-get install -y ffmpeg ninja-build g++ git curl build-essential libomp-dev && rm -rf /var/lib/apt/lists/* 24.7s
=> [metavoice-server 3/7] WORKDIR /app 0.0s
=> [metavoice-server 4/7] COPY . . 0.0s
=> [metavoice-server 5/7] RUN MAX_JOBS=1 pip install --no-cache-dir "torch>=2.1.0" 10.1s
=> [metavoice-server 6/7] RUN MAX_JOBS=1 pip install --no-cache-dir -r requirements.txt 135.1s
=> [metavoice-server 7/7] RUN pip install -e . 3.4s
=> [metavoice-server] exporting to image 2.8s
=> => exporting layers 2.8s
=> => writing image sha256:d20f670219780639211c6c762183bafca3ae7c00f1e55aa597f20e6f43912086 0.0s
=> => naming to docker.io/library/metavoice-server:latest 0.0s
[+] Running 2/0
✔ Network metavoice-src-groovybits_metavoice-net Created 0.0s
✔ Container metavoice-server Created 0.0s
Attaching to metavoice-server
metavoice-server | WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
metavoice-server | PyTorch 2.2.0 with CUDA None (you have 2.1.0)
metavoice-server | Python 3.11.8 (you have 3.11.8)
metavoice-server | Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
metavoice-server | Memory-efficient attention, SwiGLU, sparse and more won't be available.
metavoice-server | Set XFORMERS_MORE_DETAILS=1 for more details
metavoice-server | /usr/local/lib/python3.11/site-packages/df/io.py:9: UserWarning: `torchaudio.backend.common.AudioMetaData` has been moved to `torchaudio.AudioMetaData`. Please update the import path.
metavoice-server | from torchaudio.backend.common import AudioMetaData
metavoice-server | /app/fam/llm/layers/attn.py:14: UserWarning: flash_attn not installed, make sure to replace attention mechanism with torch_attn
metavoice-server | warnings.warn("flash_attn not installed, make sure to replace attention mechanism with torch_attn")
Fetching 6 files: 100% 6/6 [00:00<00:00, 43996.20it/s]
metavoice-server | number of parameters: 1239.00M
metavoice-server | loading configuration file config.json from cache at /.hf-cache/hub/models--facebook--encodec_24khz/snapshots/c1dbe2ae3f1de713481a3b3e7c47f357092ee040/config.json
metavoice-server | Model config EncodecConfig {
metavoice-server | "_name_or_path": "ArthurZ/encodec_24khz",
metavoice-server | "architectures": [
metavoice-server | "EncodecModel"
metavoice-server | ],
You need to change kv_cache type to "vanilla" in https://github.com/metavoiceio/metavoice-src/blob/main/fam/llm/serving.py#L188 to avoid depending on flash_attn
You need to change kv_cache type to "vanilla" in https://github.com/metavoiceio/metavoice-src/blob/main/fam/llm/serving.py#L188 to avoid depending on flash_attn
Thank you!
Update:
It hits an issue with the float type used, even when fiddling around with it changing the torch_attn choice between "hand" or forcing the types to be right for q,k,v it still fails in another place within the torch packages files. This is what I see when changing to "vanilla" type...
All the weights of EncodecModel were initialized from the model checkpoint at facebook/encodec_24khz.
If your task is similar to the task the model of the checkpoint was trained on, you can already use EncodecModel for predictions without further training.
2024-02-14 07:13:53 | INFO | DF | Running on torch 2.1.0
2024-02-14 07:13:53 | INFO | DF | Running on host earth.local
2024-02-14 07:13:53 | INFO | DF | Git commit: eb7338abb, branch: stable
2024-02-14 07:13:53 | INFO | DF | Loading model settings of DeepFilterNet3
2024-02-14 07:13:53 | INFO | DF | Using DeepFilterNet3 model at /Users/chris/Library/Caches/DeepFilterNet/DeepFilterNet3
2024-02-14 07:13:53 | INFO | DF | Initializing model `deepfilternet3`
2024-02-14 07:13:53 | INFO | DF | Found checkpoint /Users/chris/Library/Caches/DeepFilterNet/DeepFilterNet3/checkpoints/model_120.ckpt.best with epoch 120
2024-02-14 07:13:53 | INFO | DF | Running on device cpu
2024-02-14 07:13:53 | INFO | DF | Model loaded
INFO: Started server process [84212]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:58003 (Press CTRL+C to quit)
getting cached speaker ref files: 0%| | 0/1 [00:00<?, ?it/s][src/libmpg123/id3.c:INT123_id3_to_utf8():394] warning: Weird tag size 101 for encoding 1 - I will probably trim too early or something but I think the MP3 is broken.
getting cached speaker ref files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.96it/s]
calculating speaker embeddings: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1264.87it/s]
batch: 0%| | 0/1 [00:00<?, ?it/s][hack!!!!] Guidance is on, so we're doubling/tripling batch size! | 0/1728 [00:00<?, ?it/s]
tokens: 0%| | 0/1728 [00:00<?, ?it/s]
batch: 0%| | 0/1 [00:00<?, ?it/s]
Error processing request {'text': 'This is a demo of text to speech by MetaVoice-1B, an open-source foundational audio model by MetaVoice.', 'guidance': [3.0, 1.0], 'top_p': 0.95, 'speaker_ref_path': 'https://cdn.themetavoice.xyz/speakers/bria.mp3'}
Traceback (most recent call last):
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/serving.py", line 105, in text_to_speech
wav_out_path = sample_utterance(
^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/sample.py", line 546, in sample_utterance
return _sample_utterance_batch(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/sample.py", line 477, in _sample_utterance_batch
b_tokens = first_stage_model(
^^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/sample.py", line 356, in __call__
return self.causal_sample(
^^^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/sample.py", line 231, in causal_sample
y = self.model.generate(
^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/model.py", line 369, in generate
return self._causal_sample(
^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/mixins/causal.py", line 410, in _causal_sample
batch_idx = self._sample_batch(
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/mixins/causal.py", line 264, in _sample_batch
idx_next = self._sample_next_token(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/mixins/causal.py", line 85, in _sample_next_token
list_logits, _ = self(
^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/model.py", line 282, in forward
x = block(x)
^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/layers/combined.py", line 50, in forward
x = x + self.attn(self.ln_1(x))
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/layers/attn.py", line 303, in forward
y = self._torch_attn(c_x)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/chris/code/rsllm/metavoice-src-groovybits/fam/llm/layers/attn.py", line 231, in _torch_attn
y = torch.nn.functional.scaled_dot_product_attention(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: c10::BFloat16 and value.dtype: c10::BFloat16 instead.
INFO: 127.0.0.1:58092 - "POST /tts HTTP/1.1" 500 Internal Server Error
Strange error, does MPS support bfloat16? You could try setting dtype="float16" in the common_config dict in serving, looks like the bfloat is coming from the kvcache, while the model is running on float32
I'm working on this https://github.com/abeestrada/metavoice-src/commit/9f2bab004fc9a46f19fa5c4ca154b82b4c332b87
python fam/llm/sample.py --huggingface_repo_id="metavoiceio/metavoice-1B-v0.1" --spk_cond_path="assets/bria.mp3" --dtype="float32" --use_kv_cache="vanilla"
Now I'm stuck at this error, maybe flash_attn needs to be replaced with something else
NameError: name 'flash_attn_qkvpacked_func' is not defined
When I use --dtype="float16", this is the error:
File "/Code/metavoice-sirc/fam/llm/sample.py", line 207, in causal_sample
assert x[i, 0, : seq_lens[i]].tolist() == encoded_texts[i]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
I made torch_attn as default (while debugging) and first I got this:
failed to run MBD.
reason: argument 'tokens': 'float' object cannot be interpreted as an integer
Fixed with:
to_return.append(self.decoder.decode(tokens=int(tokens.item()), causal=False))
And we got some progress:
failed to run MBD.
reason: a Tensor with 8192 elements cannot be converted to Scalar
https://github.com/AbeEstrada/metavoice-src/commits/mps/
@AbeEstrada You shouldn't be updating the types when it's torch.long, this is what causes errors down the line
Made the changes https://github.com/AbeEstrada/metavoice-src/commit/f4415864c31cab010d6034c8f965de855ead0929
I was able to run it with float32
python fam/llm/sample.py --huggingface_repo_id="metavoiceio/metavoice-1B-v0.1" --spk_cond_path="assets/bria.mp3" --dtype="float32"
And torch.amp.autocast doesn't support MPS yet https://github.com/pytorch/pytorch/issues/88415 and I used the CPU in the meantime https://github.com/AbeEstrada/metavoice-src/commit/a77fdb9fd4d94e4bb4c16dc86c4a62a580f874f1, maybe there is a .to(device) missing somewhere.
failed to run MBD.
reason: Placeholder storage has not been allocated on MPS device!
And the API should work now with --use_kv_cache="vanilla https://github.com/AbeEstrada/metavoice-src/commit/77d84a02121f6dcc4c43d412e8110582e2ac959b, I'm not using the right X-Payload format, can anyone give it a try with my branch (https://github.com/AbeEstrada/metavoice-src/commits/mps)?:
python fam/llm/serving.py --huggingface_repo_id "metavoiceio/metavoice-1B-v0.1" --dtype "float32" --use_kv_cache="vanilla"
Looks the same to me on an M2, I get the....
failed to run MBD.
reason: Placeholder storage has not been allocated on MPS device!
Good job! Seems closer, very exciting.
while running with vanilla getting this warning:
UserWarning: flash_attn not installed, make sure to replace attention mechanism with torch_attn warnings.warn("flash_attn not installed, make sure to replace attention mechanism with torch_attn") ╭─ Unrecognized options ───────────────────────╮ │ Unrecognized options: --use-kv-cache=vanilla │ │ ──────────────────────────────────────────── │ │ For full helptext, run serving.py --help │ ╰──────────────────────────────────────────────╯ [2024-02-18 21:34:39,479] torch._dynamo.utils: [INFO] TorchDynamo compilation metrics: [2024-02-18 21:34:39,479] torch._dynamo.utils: [INFO] Function, Runtimes (s)
i have removed the flash_attn from requirments
by the way, if you want to resolve the above installation errors, below works:
pip install torch torchvision torchaudio pip install -r requirements.txt pip install --upgrade torch torchvision torchaudioI tried it but still no luck.
No module named 'torch' found.
remove flash_attn from requirements before installing the deps
run : pip install --upgrade pip setuptools wheel
then install deps using pip install -r requirements.txt
it worked for me
It seems like OSX still doesn't work even for the most basic sample?
venv) ➜ metavoice-src git:(main) python fam/llm/sample.py --device="cpu" --spk_cond_path="assets/bria.mp3" --text="This is a demo of text to speech by MetaVoice-1B, an open-source foundational audio model." --dtype="bfloat16"
objc[19467]: Class AVFFrameReceiver is implemented in both /Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x102cd4760) and /opt/homebrew/Cellar/ffmpeg/6.1.1_3/lib/libavdevice.60.3.100.dylib (0x144574370). One of the two will be used. Which one is undefined.
objc[19467]: Class AVFAudioReceiver is implemented in both /Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x102cd47b0) and /opt/homebrew/Cellar/ffmpeg/6.1.1_3/lib/libavdevice.60.3.100.dylib (0x1445743c0). One of the two will be used. Which one is undefined.
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.1.0 with CUDA None (you have 2.2.0)
Python 3.10.11 (you have 3.10.11)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/df/io.py:9: UserWarning: `torchaudio.backend.common.AudioMetaData` has been moved to `torchaudio.AudioMetaData`. Please update the import path.
from torchaudio.backend.common import AudioMetaData
/Users/username/projects/ai/metavoice-src/fam/llm/layers/attn.py:10: UserWarning: flash_attn not installed, make sure to replace attention mechanism with torch_attn
warnings.warn("flash_attn not installed, make sure to replace attention mechanism with torch_attn")
Fetching 6 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 119837.26it/s]
number of parameters: 1239.00M
number of parameters: 14.07M
getting cached speaker ref files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.18it/s]
calculating speaker embeddings: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1766.77it/s]
batch: 0%| | 0/1 [00:00<?, ?it/s[hack!!!!] Guidance is on, so we're doubling/tripling batch size! | 0/1728 [00:00<?, ?it/s]
tokens: 0%| | 0/1728 [00:00<?, ?it/s]
batch: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/username/projects/ai/metavoice-src/fam/llm/sample.py", line 700, in <module>
sample_utterance(
File "/Users/username/projects/ai/metavoice-src/fam/llm/sample.py", line 544, in sample_utterance
return _sample_utterance_batch(
File "/Users/username/projects/ai/metavoice-src/fam/llm/sample.py", line 475, in _sample_utterance_batch
b_tokens = first_stage_model(
File "/Users/username/projects/ai/metavoice-src/fam/llm/sample.py", line 354, in __call__
return self.causal_sample(
File "/Users/username/projects/ai/metavoice-src/fam/llm/sample.py", line 229, in causal_sample
y = self.model.generate(
File "/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/username/projects/ai/metavoice-src/fam/llm/model.py", line 369, in generate
return self._causal_sample(
File "/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/username/projects/ai/metavoice-src/fam/llm/mixins/causal.py", line 410, in _causal_sample
batch_idx = self._sample_batch(
File "/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/username/projects/ai/metavoice-src/fam/llm/mixins/causal.py", line 264, in _sample_batch
idx_next = self._sample_next_token(
File "/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/username/projects/ai/metavoice-src/fam/llm/mixins/causal.py", line 85, in _sample_next_token
list_logits, _ = self(
File "/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/username/projects/ai/metavoice-src/fam/llm/model.py", line 282, in forward
x = block(x)
File "/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/username/projects/ai/metavoice-src/fam/llm/layers/combined.py", line 50, in forward
x = x + self.attn(self.ln_1(x))
File "/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/username/projects/ai/metavoice-src/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/username/projects/ai/metavoice-src/fam/llm/layers/attn.py", line 221, in forward
y = self._torch_attn(c_x)
File "/Users/username/projects/ai/metavoice-src/fam/llm/layers/attn.py", line 189, in _torch_attn
y = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: c10::BFloat16 and value.dtype: c10::BFloat16 instead.```
Yes something still is wrong, I hit it with the error off the previous posted branch with work towards it...
failed to run MBD.
reason: Placeholder storage has not been allocated on MPS device!
I'm also watching the Candle Rust patch that is "in progress" right now that will allow another path for this. Not sure how long that will take but Candle and Metal play nice and would be very clean efficient and safe yet compiled.