Jenus Pro for Mac
Hi everyone, I’ve created a fork for running Jenus on Mac. I hope you find it useful! Please note that only Jenus-Pro is supported. Here's the link: https://github.com/takahirosir/Janus
Which Mac are you running, and how is the performance?
backends/_asyncio.py", line 962, in run result = context.run(func, *args) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/gradio/utils.py", line 661, in wrapper response = f(*args, **kwargs) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro.py", line 162, in generate_image output, patches = generate(input_ids, File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro.py", line 120, in generate patches = vl_gpt.gen_vision_model.decode_code(generated_tokens.to(dtype=torch.int), File "/Users/doffe/PycharmProjects/Janus/janus/models/vq_model.py", line 507, in decode_code dec = self.decode(quant_b) File "/Users/doffe/PycharmProjects/Janus/janus/models/vq_model.py", line 502, in decode dec = self.decoder(quant) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/doffe/PycharmProjects/Janus/janus/models/vq_model.py", line 208, in forward h = block.upsample(h) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/doffe/PycharmProjects/Janus/janus/models/vq_model.py", line 419, in forward x = F.interpolate(x.to(torch.float), scale_factor=2.0, mode="nearest").to( TypeError: Trying to convert BFloat16 to the MPS backend but it does not have support for that dtype.
Hello author. I have forked your code but seems here's still some problems here; My mac is of M4Max version
This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from Terminal to deploy to Spaces (https://huggingface.co/spaces)
(mpsFileLoc): /AppleInternal/Library/BuildRoots/d187755d-b9a3-11ef-83e5-aabfac210453/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:233:0: error: 'mps.add' op requires the same element type for all operands and results
(mpsFileLoc): /AppleInternal/Library/BuildRoots/d187755d-b9a3-11ef-83e5-aabfac210453/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:233:0: note: see current operation: %10 = "mps.add"(%8, %9) : (tensor<1x576x1xf16>, tensor<1xf32>) -> tensor<*xf32>
(mpsFileLoc): /AppleInternal/Library/BuildRoots/d187755d-b9a3-11ef-83e5-aabfac210453/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:233:0: error: 'mps.add' op requires the same element type for all operands and results
(mpsFileLoc): /AppleInternal/Library/BuildRoots/d187755d-b9a3-11ef-83e5-aabfac210453/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:233:0: note: see current operation: %10 = "mps.add"(%8, %9) : (tensor<1x576x1xf16>, tensor<1xf32>) -> tensor<*xf32>
/AppleInternal/Library/BuildRoots/d187755d-b9a3-11ef-83e5-aabfac210453/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:975: failed assertion `original module failed verification'
and here's the error log in image-to-text
It works on Apple silicon with the changes in this fork. However, In app_januspro.py, you need to make sure the dtype for "mps" is also bfloat16 in order to not have problem with text-to-image generation. (Error message: RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same)
if torch.cuda.is_available():
device = "cuda"
dtype = torch.bfloat16
else:
device = "mps"
# dtype = torch.float16
dtype = torch.bfloat16
It works on Apple silicon with the changes in this fork. However, In app_januspro.py, you need to make sure the dtype for "mps" is also bfloat16 in order to not have problem with text-to-image generation. (Error message:
RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same)if torch.cuda.is_available(): device = "cuda" dtype = torch.bfloat16 else: device = "mps" # dtype = torch.float16 dtype = torch.bfloat16
hello, I have modified the code as yours, but here's my new log error; seems that apple cpu not support for bfloat16?: File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/gradio/utils.py", line 661, in wrapper response = f(*args, **kwargs) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro_fk.py", line 162, in generate_image output, patches = generate(input_ids, File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro_fk.py", line 97, in generate inputs_embeds = vl_gpt.language_model.get_input_embeddings()(tokens) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) TypeError: Trying to convert BFloat16 to the MPS backend but it does not have support for that dtype.
Running januspro demo script with 7B or 1B seems to have just saturated my 192G Mac M2 Ultra VRAM and had to do a Ctrl+C to stop it.
Is this expected? I'm guessing not, wondering if it is related to the Mac aspect or something else?
M4MAX 成功启动gui,但是运行跑不出结果,响应一直卡着。
Which Mac are you running, and how is the performance?
@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).
Which Mac are you running, and how is the performance?
@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).
![]()
我是m4max 64g,运行gui后,提交chat queue1/1等了1000+秒没有响应完成。不知道问题出在哪里。没有任何trackback。I will try last commit later thanku
It works on Apple silicon with the changes in this fork. However, In app_januspro.py, you need to make sure the dtype for "mps" is also bfloat16 in order to not have problem with text-to-image generation. (Error message:
RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same)if torch.cuda.is_available(): device = "cuda" dtype = torch.bfloat16 else: device = "mps" # dtype = torch.float16 dtype = torch.bfloat16hello, I have modified the code as yours, but here's my new log error; seems that apple cpu not support for bfloat16?: File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/gradio/utils.py", line 661, in wrapper response = f(*args, **kwargs) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro_fk.py", line 162, in generate_image output, patches = generate(input_ids, File "/Users/doffe/PycharmProjects/Janus/demo/app_januspro_fk.py", line 97, in generate inputs_embeds = vl_gpt.language_model.get_input_embeddings()(tokens) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/Users/doffe/miniforge3/envs/janus/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) TypeError: Trying to convert BFloat16 to the MPS backend but it does not have support for that dtype.
@DoffeBupt it's not only this place you need to change, you can try the latest update in my repo.
Which Mac are you running, and how is the performance?
@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).
我是m4max 64g,运行gui后,提交chat queue1/1等了1000+秒没有响应完成。不知道问题出在哪里。没有任何trackback。I will try last commit later thanku
@gamo69497 试试我最新上传的commit,改了几个地方。你这个响应出问题是图生文还是文生图?第一版的图生文还是挺快的,不过不能用7b的,7b我也跑不出来。
It works on Apple silicon with the changes in this fork. However, In app_januspro.py, you need to make sure the dtype for "mps" is also bfloat16 in order to not have problem with text-to-image generation. (Error message:
RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same)if torch.cuda.is_available(): device = "cuda" dtype = torch.bfloat16 else: device = "mps" # dtype = torch.float16 dtype = torch.bfloat16
@davidthamwf Yeah there's some problems here, I fixed some bugs please try the latest commit.
Which Mac are you running, and how is the performance?
@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).
我是m4max 64g,运行gui后,提交chat queue1/1等了1000+秒没有响应完成。不知道问题出在哪里。没有任何trackback。I will try last commit later thanku
@gamo69497 试试我最新上传的commit,改了几个地方。你这个响应出问题是图生文还是文生图?第一版的图生文还是挺快的,不过不能用7b的,7b我也跑不出来。
我就是用的7b,无法生成结果。1b确实可以。为什么会在7b出现问题呢?🤔。
Device: Mac Studio M1 Ultra
diff --git a/demo/app_januspro.py b/demo/app_januspro.py
index 702e58e..2e0ec99 100644
--- a/demo/app_januspro.py
+++ b/demo/app_januspro.py
@@ -21,12 +21,14 @@ vl_gpt = AutoModelForCausalLM.from_pretrained(model_path,
trust_remote_code=True)
if torch.cuda.is_available():
vl_gpt = vl_gpt.to(torch.bfloat16).cuda()
+ cuda_device = "cuda"
else:
- vl_gpt = vl_gpt.to(torch.float16)
+ cuda_device = "mps"
+ vl_gpt = vl_gpt.to(torch.bfloat16).to(cuda_device).eval()
vl_chat_processor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer
-cuda_device = 'cuda' if torch.cuda.is_available() else 'cpu'
+# cuda_device = 'cuda' if torch.cuda.is_available() else 'cpu'
@torch.inference_mode()
# @spaces.GPU(duration=120)
@@ -146,7 +148,7 @@ def generate_image(prompt,
np.random.seed(seed)
width = 384
height = 384
- parallel_size = 5
+ parallel_size = 1
with torch.no_grad():
messages = [{'role': '<|User|>', 'content': prompt},
diff --git a/pyproject.toml b/pyproject.toml
index 382f829..1701179 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -32,7 +32,7 @@ gradio = [
"colorama==0.4.5",
"Pygments==2.12.0",
"markdown==3.4.1",
- "SentencePiece==0.1.96"
+ "SentencePiece==0.2.0"
]
lint = [
"isort",
Which Mac are you running, and how is the performance?
@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).
我是m4max 64g,运行gui后,提交chat queue1/1等了1000+秒没有响应完成。不知道问题出在哪里。没有任何trackback。I will try last commit later thanku
@gamo69497 试试我最新上传的commit,改了几个地方。你这个响应出问题是图生文还是文生图?第一版的图生文还是挺快的,不过不能用7b的,7b我也跑不出来。
我就是用的7b,无法生成结果。1b确实可以。为什么会在7b出现问题呢?🤔。
我觉得是因为太大了,带不动😂,它没优化过
Which Mac are you running, and how is the performance?
@holmbuar Mac mini m4 16G and 7b doesn't work, 1b works but take around 10 seconds for text generation, image generation take around 300 seconds (I also fixed a bug about the hard coded bfloat in upsample, please use the latest commit).
我是m4max 64g,运行gui后,提交chat queue1/1等了1000+秒没有响应完成。不知道问题出在哪里。没有任何trackback。I will try last commit later thanku
@gamo69497 试试我最新上传的commit,改了几个地方。你这个响应出问题是图生文还是文生图?第一版的图生文还是挺快的,不过不能用7b的,7b我也跑不出来。
我就是用的7b,无法生成结果。1b确实可以。为什么会在7b出现问题呢?🤔。
我觉得是因为太大了,带不动😂,它没优化过
你的能跑通7b 。成功了,一共要吃80g显存,在128g macbook上能20s/q带图对话,90s/q生图。