ktransformers icon indicating copy to clipboard operation
ktransformers copied to clipboard

这个模型跑起来,如何使用呀,如何和其他系统进行对接,比如dify 国产的maxkb上如何使用了

Open AK760 opened this issue 10 months ago • 31 comments

AK760 avatar Feb 19 '25 15:02 AK760

mark

htl258 avatar Feb 20 '25 01:02 htl258

mark

0xlilwok avatar Feb 20 '25 13:02 0xlilwok

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常

Image

😋😋😋

maaaxinfinity avatar Feb 20 '25 17:02 maaaxinfinity

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常

Image

😋😋😋

你好,你们dify上的api接口怎么填写的,方便指导一下吗,是 用的dify上OpenAI-API-compatible接口吗?

yuweimian-shy avatar Feb 21 '25 02:02 yuweimian-shy

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

你好,你们dify上的api接口怎么填写的,方便指导一下吗,是 用的dify上OpenAI-API-compatible接口吗?

Image

直接对接openai就行 apikey 随便填 endpoint写api的

maaaxinfinity avatar Feb 21 '25 09:02 maaaxinfinity

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

你好,你们dify上的api接口怎么填写的,方便指导一下吗,是 用的dify上OpenAI-API-compatible接口吗?

Image

直接对接openai就行 apikey 随便填 endpoint写api的

ok成功了,谢谢,请问你们在什么硬件条件下部署的?速度怎么样啊?大概多少tokens?我们这边实际看起来输出速度很慢,但是tokens数量在17tokens/s左右

yuweimian-shy avatar Feb 21 '25 09:02 yuweimian-shy

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

你好,你们dify上的api接口怎么填写的,方便指导一下吗,是 用的dify上OpenAI-API-compatible接口吗?

Image

直接对接openai就行 apikey 随便填 endpoint写api的

ok成功了,谢谢,请问你们在什么硬件条件下部署的?速度怎么样啊?大概多少tokens?我们这边实际看起来输出速度很慢,但是tokens数量在17tokens/s左右

我们实际上就跑了13-15tks 双路9334qs+1152g ddr5 5600(实际工作频率4800) 单卡4090

想问问你们的配置如何呢

maaaxinfinity avatar Feb 21 '25 09:02 maaaxinfinity

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

你好,你们dify上的api接口怎么填写的,方便指导一下吗,是 用的dify上OpenAI-API-compatible接口吗?

Image 直接对接openai就行 apikey 随便填 endpoint写api的

ok成功了,谢谢,请问你们在什么硬件条件下部署的?速度怎么样啊?大概多少tokens?我们这边实际看起来输出速度很慢,但是tokens数量在17tokens/s左右

我们实际上就跑了13-15tks 双路9334qs+1152g ddr5 5600(实际工作频率4800) 单卡4090

想问问你们的配置如何呢

你好,你用的哪一个版本的ktransformers 版本呀,能否分享一下详细的教程,比如启动服务的指令?谢谢大佬 我执行了以下指令 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python3 ./ktransformers/server/main.py --model_path deepseek-ai/DeepSeek-R1 --gguf_path /data1/DeepSeekModels/unsloth/DeepSeek-R1-GGUF/DeepSeek-R1-Q4_K_M/ --cpu_infer 64 --max_new_tokens 8192 --cache_lens 32768 --total_context 32768 --cache_q4 true --temperature 0.6 --top_p 0.95 --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml --force_think True --host 0.0.0.0 --port 10080 输出为:INFO: Started server process [196535] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:10080 (Press CTRL+C to quit) 在dify里面设置如下

Image 但是网络错误

Binyun-Z avatar Feb 26 '25 07:02 Binyun-Z

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

你好,你们dify上的api接口怎么填写的,方便指导一下吗,是 用的dify上OpenAI-API-compatible接口吗?

Image 直接对接openai就行 apikey 随便填 endpoint写api的

ok成功了,谢谢,请问你们在什么硬件条件下部署的?速度怎么样啊?大概多少tokens?我们这边实际看起来输出速度很慢,但是tokens数量在17tokens/s左右

我们实际上就跑了13-15tks 双路9334qs+1152g ddr5 5600(实际工作频率4800) 单卡4090

想问问你们的配置如何呢

你好,你用的哪一个版本的ktransformers 版本呀,能否分享一下详细的教程,比如启动服务的指令?谢谢大佬 我执行了以下指令 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python3 ./ktransformers/server/main.py --model_path deepseek-ai/DeepSeek-R1 --gguf_path /data1/DeepSeekModels/unsloth/DeepSeek-R1-GGUF/DeepSeek-R1-Q4_K_M/ --cpu_infer 64 --max_new_tokens 8192 --cache_lens 32768 --total_context 32768 --cache_q4 true --temperature 0.6 --top_p 0.95 --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml --force_think True --host 0.0.0.0 --port 10080 输出为:INFO: Started server process [196535] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:10080 (Press CTRL+C to quit) 在dify里面设置如下

Image 但是网络错误

你这是映射到公网了么

maaaxinfinity avatar Feb 27 '25 00:02 maaaxinfinity

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常

Image

😋😋😋

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

tianwaifeidie avatar Feb 28 '25 00:02 tianwaifeidie

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常

Image

😋😋😋

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

能来几张图吗

maaaxinfinity avatar Feb 28 '25 01:02 maaaxinfinity

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

能来几张图吗

ERROR: Exception in ASGI application Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 754, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 774, in app await route.handle(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 295, in handle await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 74, in app response = await f(request) File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 297, in app raw_response = await run_endpoint_function( File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 210, in run_endpoint_function return await dependant.call(**values) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/api/openai/endpoints/chat.py", line 40, in chat_completion async for token in interface.inference(input_message,id): File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/ktransformers.py", line 181, in inference async for v in super().inference(local_messages, thread_id): File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 340, in inference for t in self.prefill(input_ids, self.check_is_new(thread_id)): File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/ktransformers.py", line 171, in prefill next_token = self.logits_to_token(logits[0, -1, :]) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 214, in logits_to_token last = torch.multinomial(probs, num_samples=1) RuntimeError: CUDA generator expects graph capture to be underway, but the current stream is not capturing.

tianwaifeidie avatar Feb 28 '25 01:02 tianwaifeidie

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

能来几张图吗

ERROR: Exception in ASGI application Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 754, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 774, in app await route.handle(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 295, in handle await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 74, in app response = await f(request) File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 297, in app raw_response = await run_endpoint_function( File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 210, in run_endpoint_function return await dependant.call(**values) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/api/openai/endpoints/chat.py", line 40, in chat_completion async for token in interface.inference(input_message,id): File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/ktransformers.py", line 181, in inference async for v in super().inference(local_messages, thread_id): File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 340, in inference for t in self.prefill(input_ids, self.check_is_new(thread_id)): File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/ktransformers.py", line 171, in prefill next_token = self.logits_to_token(logits[0, -1, :]) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 214, in logits_to_token last = torch.multinomial(probs, num_samples=1) RuntimeError: CUDA generator expects graph capture to be underway, but the current stream is not capturing.

你是关了cuda graph吗

maaaxinfinity avatar Feb 28 '25 01:02 maaaxinfinity

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

能来几张图吗

ERROR: Exception in ASGI application Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 754, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 774, in app await route.handle(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 295, in handle await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 74, in app response = await f(request) File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 297, in app raw_response = await run_endpoint_function( File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 210, in run_endpoint_function return await dependant.call(**values) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/api/openai/endpoints/chat.py", line 40, in chat_completion async for token in interface.inference(input_message,id): File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/ktransformers.py", line 181, in inference async for v in super().inference(local_messages, thread_id): File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 340, in inference for t in self.prefill(input_ids, self.check_is_new(thread_id)): File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/ktransformers.py", line 171, in prefill next_token = self.logits_to_token(logits[0, -1, :]) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 214, in logits_to_token last = torch.multinomial(probs, num_samples=1) RuntimeError: CUDA generator expects graph capture to be underway, but the current stream is not capturing.

你是关了cuda graph吗

没有关,启动命令是 ktransformers --model_path XXXX --gguf_path XXXXX --optimize_config_path XXXX --cpu_infer 75 --max_new_tokens 1000 --port 10002 --cache_lens 12288

tianwaifeidie avatar Feb 28 '25 01:02 tianwaifeidie

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常

Image

😋😋😋

Image Image 大佬为什么我启动后连接不了呢

EI-Dios avatar Feb 28 '25 04:02 EI-Dios

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常

Image

😋😋😋

Image Image 大佬为什么我启动后连接不了呢

容器的话,建议看看host对不对

maaaxinfinity avatar Feb 28 '25 04:02 maaaxinfinity

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

大佬你们遇到 ERROR: Exception in ASGI application 了吗,每次Dify连接API就会报这个错,0.2.1版本Docker部署的

能来几张图吗

ERROR: Exception in ASGI application Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 754, in call await self.middleware_stack(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 774, in app await route.handle(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 295, in handle await self.app(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 74, in app response = await f(request) File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 297, in app raw_response = await run_endpoint_function( File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 210, in run_endpoint_function return await dependant.call(**values) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/api/openai/endpoints/chat.py", line 40, in chat_completion async for token in interface.inference(input_message,id): File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/ktransformers.py", line 181, in inference async for v in super().inference(local_messages, thread_id): File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 340, in inference for t in self.prefill(input_ids, self.check_is_new(thread_id)): File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/ktransformers.py", line 171, in prefill next_token = self.logits_to_token(logits[0, -1, :]) File "/opt/conda/lib/python3.10/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 214, in logits_to_token last = torch.multinomial(probs, num_samples=1) RuntimeError: CUDA generator expects graph capture to be underway, but the current stream is not capturing.

你是关了cuda graph吗

没有关,启动命令是 ktransformers --model_path XXXX --gguf_path XXXXX --optimize_config_path XXXX --cpu_infer 75 --max_new_tokens 1000 --port 10002 --cache_lens 12288 cachelens小点试试呢,主要我没碰到过这问题(

maaaxinfinity avatar Feb 28 '25 04:02 maaaxinfinity

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

Image Image 大佬为什么我启动后连接不了呢

容器的话,建议看看host对不对

我dify使用docker-compose启动的 之前连本地服务器的ollama都是可以的 ds是直接在本地服务器启动的 按理说不应该连接不了也 两个都在同一台服务器上面

EI-Dios avatar Feb 28 '25 04:02 EI-Dios

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

Image Image 大佬为什么我启动后连接不了呢

容器的话,建议看看host对不对

我dify使用docker-compose启动的 之前连本地服务器的ollama都是可以的 ds是直接在本地服务器启动的 按理说不应该连接不了也 两个都在同一台服务器上面

你是做了转发吗,还有你可以看看docker中的日志,尝试连接的

maaaxinfinity avatar Feb 28 '25 04:02 maaaxinfinity

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

Image Image 大佬为什么我启动后连接不了呢

容器的话,建议看看host对不对

我dify使用docker-compose启动的 之前连本地服务器的ollama都是可以的 ds是直接在本地服务器启动的 按理说不应该连接不了也 两个都在同一台服务器上面

你是做了转发吗,还有你可以看看docker中的日志,尝试连接的

没有做转发 我看了dify-nginx docker日志也没有什么异常 我试了下用postman访问ds接口 好像也不行 可能是我ds接口起的有问题

EI-Dios avatar Feb 28 '25 05:02 EI-Dios

mark

tenyee avatar Mar 03 '25 07:03 tenyee

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

你好,你们dify上的api接口怎么填写的,方便指导一下吗,是 用的dify上OpenAI-API-compatible接口吗?

你好大佬这个是怎么连接自己的模型的,我在本地部署的ktransformers的deepseek可以被openwebui访问,但是到了这里下面的模型和新增模型都找不到我的模型,这是我连接的配置图

Image

hahali-li avatar Mar 16 '25 13:03 hahali-li

您好,邮件我已收到,我会尽快查看~~~谢谢~~

tenyee avatar Mar 16 '25 13:03 tenyee

我们对接dify使用比较流畅了 api在长问题中有可能core dumped 参考 #505 其它体验都还正常 Image 😋😋😋

你好,你们dify上的api接口怎么填写的,方便指导一下吗,是 用的dify上OpenAI-API-compatible接口吗?

你好大佬这个是怎么连接自己的模型的,我在本地部署的ktransformers的deepseek可以被openwebui访问,但是到了这里下面的模型和新增模型都找不到我的模型,这是我连接的配置图

Image 你检查你终端的响应

maaaxinfinity avatar Mar 16 '25 13:03 maaaxinfinity

Image终端跳出的好像是ktransformers给的回应,这是怎么回事呢,我的ktransformers是本地conda部署的,dify是docker部署的

hahali-li avatar Mar 16 '25 13:03 hahali-li

Image终端跳出的好像是ktransformers给的回应,这是怎么回事呢,我的ktransformers是本地conda部署的,dify是docker部署的

没啥问题 dify为了验证api是否可用会直接测试回应 等他回复完成应该就行了

maaaxinfinity avatar Mar 16 '25 13:03 maaaxinfinity

Image终端跳出的好像是ktransformers给的回应,这是怎么回事呢,我的ktransformers是本地conda部署的,dify是docker部署的

没啥问题 dify为了验证api是否可用会直接测试回应 等他回复完成应该就行了

Image我在后续调用出现了疑问,因为列表中没有我的模型,我使用添加模型显示没有找到模型我的运行指令中有指定名称--model_name Deepseek-V3:671B,想问这个是如何连接的呢

hahali-li avatar Mar 16 '25 13:03 hahali-li

Image终端跳出的好像是ktransformers给的回应,这是怎么回事呢,我的ktransformers是本地conda部署的,dify是docker部署的

没啥问题 dify为了验证api是否可用会直接测试回应 等他回复完成应该就行了

Image我在后续调用出现了疑问,因为列表中没有我的模型,我使用添加模型显示没有找到模型我的运行指令中有指定名称--model_name Deepseek-V3:671B,想问这个是如何连接的呢

他都是遵循的openai的标准 model也是 你调用openai的任何一个model都是它 不需要自己手动添加模型

maaaxinfinity avatar Mar 16 '25 13:03 maaaxinfinity

Image终端跳出的好像是ktransformers给的回应,这是怎么回事呢,我的ktransformers是本地conda部署的,dify是docker部署的

没啥问题 dify为了验证api是否可用会直接测试回应 等他回复完成应该就行了

Image我在后续调用出现了疑问,因为列表中没有我的模型,我使用添加模型显示没有找到模型我的运行指令中有指定名称--model_name Deepseek-V3:671B,想问这个是如何连接的呢

他都是遵循的openai的标准 model也是 你调用openai的任何一个model都是它 不需要自己手动添加模型

确实是我的疏忽没有尝试过,非常感谢大佬的帮助

hahali-li avatar Mar 16 '25 14:03 hahali-li

请问一下dify接上ktransformers的模型以后,问答的时候在dify前端显示的并不是流式的一个输出,这个有什么解决方法吗

lixinwangniu avatar Jul 25 '25 09:07 lixinwangniu