Janus Jenus Pro for Mac

Hi everyone, I’ve created a fork for running Jenus on Mac. I hope you find it useful! Please note that only Jenus-Pro is supported. Here's the link: https://github.com/takahirosir/Janus

Jan 28 '25 06:01 takahirosir

Which Mac are you running, and how is the performance?

Jan 28 '25 07:01 holmbuar

backends/_asyncio.py", line 962, in run result = context.run(func, *args) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/gradio/utils.py", line 661, in wrapper response = f(*args, **kwargs) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro.py", line 162, in generate_image output, patches = generate(input_ids, File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro.py", line 120, in generate patches = vl_gpt.gen_vision_model.decode_code(generated_tokens.to(dtype=torch.int), File "/Users/doffe/PycharmProjects/Janus/janus/models/vq_model.py", line 507, in decode_code dec = self.decode(quant_b) File "/Users/doffe/PycharmProjects/Janus/janus/models/vq_model.py", line 502, in decode dec = self.decoder(quant) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/doffe/PycharmProjects/Janus/janus/models/vq_model.py", line 208, in forward h = block.upsample(h) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/doffe/PycharmProjects/Janus/janus/models/vq_model.py", line 419, in forward x = F.interpolate(x.to(torch.float), scale_factor=2.0, mode="nearest").to( TypeError: Trying to convert BFloat16 to the MPS backend but it does not have support for that dtype.

Jan 28 '25 08:01 DoffeBupt

Hello author. I have forked your code but seems here's still some problems here; My mac is of M4Max version

Jan 28 '25 08:01 DoffeBupt

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from Terminal to deploy to Spaces (https://huggingface.co/spaces) (mpsFileLoc): /AppleInternal/Library/BuildRoots/d187755d-b9a3-11ef-83e5-aabfac210453/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:233:0: error: 'mps.add' op requires the same element type for all operands and results (mpsFileLoc): /AppleInternal/Library/BuildRoots/d187755d-b9a3-11ef-83e5-aabfac210453/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:233:0: note: see current operation: %10 = "mps.add"(%8, %9) : (tensor<1x576x1xf16>, tensor<1xf32>) -> tensor<*xf32> (mpsFileLoc): /AppleInternal/Library/BuildRoots/d187755d-b9a3-11ef-83e5-aabfac210453/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:233:0: error: 'mps.add' op requires the same element type for all operands and results (mpsFileLoc): /AppleInternal/Library/BuildRoots/d187755d-b9a3-11ef-83e5-aabfac210453/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:233:0: note: see current operation: %10 = "mps.add"(%8, %9) : (tensor<1x576x1xf16>, tensor<1xf32>) -> tensor<*xf32> /AppleInternal/Library/BuildRoots/d187755d-b9a3-11ef-83e5-aabfac210453/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:975: failed assertion `original module failed verification'

and here's the error log in image-to-text

Jan 28 '25 08:01 DoffeBupt

It works on Apple silicon with the changes in this fork. However, In app_januspro.py, you need to make sure the dtype for "mps" is also bfloat16 in order to not have problem with text-to-image generation. (Error message: RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same)

if torch.cuda.is_available():
    device = "cuda"
    dtype = torch.bfloat16
else:
    device = "mps"
    # dtype = torch.float16
    dtype = torch.bfloat16

Jan 28 '25 10:01 davidthamwf

It works on Apple silicon with the changes in this fork. However, In app_januspro.py, you need to make sure the dtype for "mps" is also bfloat16 in order to not have problem with text-to-image generation. (Error message: RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same)
if torch.cuda.is_available():
    device = "cuda"
    dtype = torch.bfloat16
else:
    device = "mps"
    # dtype = torch.float16
    dtype = torch.bfloat16

hello, I have modified the code as yours, but here's my new log error; seems that apple cpu not support for bfloat16?: File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/gradio/utils.py", line 661, in wrapper response = f(*args, **kwargs) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro_fk.py", line 162, in generate_image output, patches = generate(input_ids, File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro_fk.py", line 97, in generate inputs_embeds = vl_gpt.language_model.get_input_embeddings()(tokens) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) TypeError: Trying to convert BFloat16 to the MPS backend but it does not have support for that dtype.

Jan 28 '25 10:01 DoffeBupt

Running januspro demo script with 7B or 1B seems to have just saturated my 192G Mac M2 Ultra VRAM and had to do a Ctrl+C to stop it.

Is this expected? I'm guessing not, wondering if it is related to the Mac aspect or something else?

Jan 28 '25 12:01 groovybits

M4MAX 成功启动gui，但是运行跑不出结果，响应一直卡着。

Jan 28 '25 15:01 gamo69497

Which Mac are you running, and how is the performance?

@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).

Jan 28 '25 18:01 takahirosir

Which Mac are you running, and how is the performance?

@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).

我是m4max 64g，运行gui后，提交chat queue1/1等了1000+秒没有响应完成。不知道问题出在哪里。没有任何trackback。I will try last commit later thanku

Jan 28 '25 18:01 gamo69497

It works on Apple silicon with the changes in this fork. However, In app_januspro.py, you need to make sure the dtype for "mps" is also bfloat16 in order to not have problem with text-to-image generation. (Error message: RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same)
if torch.cuda.is_available():
    device = "cuda"
    dtype = torch.bfloat16
else:
    device = "mps"
    # dtype = torch.float16
    dtype = torch.bfloat16
hello, I have modified the code as yours, but here's my new log error; seems that apple cpu not support for bfloat16?: File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/gradio/utils.py", line 661, in wrapper response = f(*args, **kwargs) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro_fk.py", line 162, in generate_image output, patches = generate(input_ids, File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro_fk.py", line 97, in generate inputs_embeds = vl_gpt.language_model.get_input_embeddings()(tokens) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) TypeError: Trying to convert BFloat16 to the MPS backend but it does not have support for that dtype.

@DoffeBupt it's not only this place you need to change, you can try the latest update in my repo.

Jan 28 '25 19:01 takahirosir

Which Mac are you running, and how is the performance?

@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).

我是m4max 64g，运行gui后，提交chat queue1/1等了1000+秒没有响应完成。不知道问题出在哪里。没有任何trackback。I will try last commit later thanku

@gamo69497 试试我最新上传的commit，改了几个地方。你这个响应出问题是图生文还是文生图？第一版的图生文还是挺快的，不过不能用7b的，7b我也跑不出来。

Jan 28 '25 19:01 takahirosir

It works on Apple silicon with the changes in this fork. However, In app_januspro.py, you need to make sure the dtype for "mps" is also bfloat16 in order to not have problem with text-to-image generation. (Error message: RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same)
if torch.cuda.is_available():
    device = "cuda"
    dtype = torch.bfloat16
else:
    device = "mps"
    # dtype = torch.float16
    dtype = torch.bfloat16

@davidthamwf Yeah there's some problems here, I fixed some bugs please try the latest commit.

Jan 28 '25 19:01 takahirosir

Which Mac are you running, and how is the performance?

@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).

我是m4max 64g，运行gui后，提交chat queue1/1等了1000+秒没有响应完成。不知道问题出在哪里。没有任何trackback。I will try last commit later thanku

@gamo69497 试试我最新上传的commit，改了几个地方。你这个响应出问题是图生文还是文生图？第一版的图生文还是挺快的，不过不能用7b的，7b我也跑不出来。

我就是用的7b，无法生成结果。1b确实可以。为什么会在7b出现问题呢？🤔。

Jan 30 '25 10:01 gamo69497

Device: Mac Studio M1 Ultra

diff --git a/demo/app_januspro.py b/demo/app_januspro.py
index 702e58e..2e0ec99 100644
--- a/demo/app_januspro.py
+++ b/demo/app_januspro.py
@@ -21,12 +21,14 @@ vl_gpt = AutoModelForCausalLM.from_pretrained(model_path,
                                              trust_remote_code=True)
 if torch.cuda.is_available():
     vl_gpt = vl_gpt.to(torch.bfloat16).cuda()
+    cuda_device = "cuda"
 else:
-    vl_gpt = vl_gpt.to(torch.float16)
+    cuda_device = "mps"
+    vl_gpt = vl_gpt.to(torch.bfloat16).to(cuda_device).eval()
 
 vl_chat_processor = VLChatProcessor.from_pretrained(model_path)
 tokenizer = vl_chat_processor.tokenizer
-cuda_device = 'cuda' if torch.cuda.is_available() else 'cpu'
+# cuda_device = 'cuda' if torch.cuda.is_available() else 'cpu'
 
 @torch.inference_mode()
 # @spaces.GPU(duration=120) 
@@ -146,7 +148,7 @@ def generate_image(prompt,
         np.random.seed(seed)
     width = 384
     height = 384
-    parallel_size = 5
+    parallel_size = 1
     
     with torch.no_grad():
         messages = [{'role': '<|User|>', 'content': prompt},
diff --git a/pyproject.toml b/pyproject.toml
index 382f829..1701179 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -32,7 +32,7 @@ gradio = [
     "colorama==0.4.5",
     "Pygments==2.12.0",
     "markdown==3.4.1",
-    "SentencePiece==0.1.96"
+    "SentencePiece==0.2.0"
 ]
 lint = [
     "isort",

Feb 09 '25 08:02 XWJACK

Which Mac are you running, and how is the performance?

@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).

我是m4max 64g，运行gui后，提交chat queue1/1等了1000+秒没有响应完成。不知道问题出在哪里。没有任何trackback。I will try last commit later thanku

@gamo69497 试试我最新上传的commit，改了几个地方。你这个响应出问题是图生文还是文生图？第一版的图生文还是挺快的，不过不能用7b的，7b我也跑不出来。

我就是用的7b，无法生成结果。1b确实可以。为什么会在7b出现问题呢？🤔。

我觉得是因为太大了，带不动😂，它没优化过

Feb 10 '25 18:02 takahirosir

Which Mac are you running, and how is the performance?

@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).

我是m4max 64g，运行gui后，提交chat queue1/1等了1000+秒没有响应完成。不知道问题出在哪里。没有任何trackback。I will try last commit later thanku

@gamo69497 试试我最新上传的commit，改了几个地方。你这个响应出问题是图生文还是文生图？第一版的图生文还是挺快的，不过不能用7b的，7b我也跑不出来。

我就是用的7b，无法生成结果。1b确实可以。为什么会在7b出现问题呢？🤔。

我觉得是因为太大了，带不动😂，它没优化过

你的能跑通7b 。成功了，一共要吃80g显存，在128g macbook上能20s/q带图对话，90s/q生图。

Feb 11 '25 00:02 gamo69497