ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: I build a image with embedding on MAC, but the embedding option in modal setting is empty. why? how can I debug?

Open SSSliangfeng opened this issue 8 months ago • 1 comments

Enable custom allreduce for AMD gpus Fix custom allreduce test Remove working_dir from ray.init in test suite as it causes import errors.

SSSliangfeng avatar Mar 03 '25 04:03 SSSliangfeng

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

github-actions[bot] avatar Mar 03 '25 08:03 github-actions[bot]

Thank you for the review, @SageMoore. I will do the ifdef cleaning.

Assuming the algorithms are the same, I think we can run some Nvidia benchmarks and convince ourselves that we haven't regressed performance at all.

The barrier logic on Nvidia has not been changed. Only name of the barrier functions is changed, it is done like in Rocm fork. But barrier execution on Nvidia left untouched.

ilmarkov avatar Mar 12 '25 11:03 ilmarkov

@ilmarkov Hello, I am in version 5.7 of rocm. The 'hipIpcMemLazyEnabled PeerAccess' of the following function needs to be' 0 '. The accuracy is incorrect after changing to '0'. Excuse me, does this require a higher version of rocm to customize all reduce this time?

kahakuka avatar Mar 14 '25 04:03 kahakuka

@kahakuka Thank you for the note! Are you using MI200 GPU?

ilmarkov avatar Mar 14 '25 08:03 ilmarkov

@kahakuka感谢您的留言!您使用的是 MI200 GPU 吗?

Yes, I also tried it on MI250, it's the same.

kahakuka avatar Mar 17 '25 01:03 kahakuka

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ilmarkov.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Mar 17 '25 12:03 mergify[bot]

@ilmarkov Hello, Thank you very much, it can run with the latest code. may I ask another question? Is your test in eager mode or CUDAGraph mode. I tested the Eagle mode and found it to be normal, but there may be issues with the accuracy of the Cudagraph mode.

kahakuka avatar Mar 18 '25 01:03 kahakuka

@kahakuka Hi, the distributed/test_custom_all_reduce.py test verifies both eager and cudaGraph modes. Can you share the script so that we could reproduce the accuracy issue?

ilmarkov avatar Mar 18 '25 13:03 ilmarkov

@kahakuka Hi, the distributed/test_custom_all_reduce.py test verifies both eager and cudaGraph modes. Can you share the script so that we could reproduce the accuracy issue? @ilmarkov Thank you very much for your reply. I validated the accuracy through testing the model. The testing method is as follows: Server side: eager:
HIP_VISIBLE_DEVICES=6,7 vllm serve /data/models/Qwen2-7B-Instruct --enforce-eager --dtype float16 --trust-remote-code -tp 2 cudagraph: HIP_VISIBLE_DEVICES=6,7 vllm serve /data/models/Qwen2-7B-Instruct --dtype float16 --trust-remote-code -tp 2

client: curl http://localhost:8000/v1/chat/completions
-H "Content-Type: application/json"
-d '{ "model": "/data/models/Qwen2-7B-Instruct", "messages": [ {"role": "system", "content": "hello"} ] }' The answer from eager is normal, while the response from CUDAGraph is garbled.

kahakuka avatar Mar 19 '25 06:03 kahakuka

@kahakuka I didn't manage to reproduce the issue on MI300X. So we decided to disable custom_allreduce on older AMD GPUs so that we could land this PR.

ilmarkov avatar Mar 19 '25 21:03 ilmarkov

@ilmarkov Hello, the code interface functions for testing do not match the actual application. I'm not sure how you compiled them on your end. image

kahakuka avatar Mar 26 '25 07:03 kahakuka

@ilmarkov Hello, I have a question for you: I have looked at the source code of nv. In the case of PCIe, even without checking nvlink, custom allreduce can still be used. Is it also possible on the rocm side.

kahakuka avatar Mar 27 '25 07:03 kahakuka

@kahakuka custom allreduce requires P2P access between each pair of GPUs. For Nvidia, it is not required to have NVLink for that. On AMD xGMI guarantees the P2P access so we only check that. It is possible to add a support for the cases without xGMI (but with p2p access) having checks similar to gpu_p2p_access_check in custom_all_reduce_utils.py but not as a part of this PR.

ilmarkov avatar Mar 28 '25 10:03 ilmarkov