RoadToNowhereX comments

Results 18 comments of


                                            RoadToNowhereX

Feature Request: New Text Detection Model Support

The point is, ctd will mix different sentences from different bubbles usually, but [ogkalu/comic-text-and-bubble-detector](https://huggingface.co/ogkalu/comic-text-and-bubble-detector) hardly does so.

Feature Request: New Text Detection Model Support

Sample code from [PekingU/rtdetr_r50vd](https://huggingface.co/PekingU/rtdetr_r50vd): ```python import torch import requests from PIL import Image from transformers import RTDetrForObjectDetection, RTDetrImageProcessor url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) image_processor = RTDetrImageProcessor.from_pretrained("ogkalu/comic-text-and-bubble-detector") model =...

Feature Request: New Text Detection Model Support

> SHANA.bandicam.2025-04-22.21-22-38-222.mp4 > "I found the original comic by using image search. I think the margin of error is within an acceptable range." 用你提供的链接试了一下，确实比原来的ctd强上不少，但是漏字和混淆不同文本框的问题还是时不时就有

Feature Request: New Text Detection Model Support

> SHANA.bandicam.2025-04-22.21-22-38-222.mp4 > "I found the original comic by using image search. I think the margin of error is within an acceptable range." 兄弟你要是还有心力的话可以试着练一下这个[PekingU/rtdetr_v2_r50vd](https://huggingface.co/PekingU/rtdetr_v2_r50vd)，或者paddle的PP-OCRv4_server_det；rtdet的预训练模型我没试过，paddle的预训练模型本身就非常强，直接就能拿来用，但是训练和推理的框架比较麻烦一点

Feature Request: New Text Detection Model Support

> > > SHANA.bandicam.2025-04-22.21-22-38-222._mp4_ > > > "I found the original comic by using image search. I think the margin of error is within an acceptable range." > > >...

Feature Request: New Text Detection Model Support

> Well, after the tests. > > I can use it, but I probably won't add it. Why? Because this detector detects blobs and the text in the blobs, and...

[Feature Request] 在提示词中增加正确提取实体的示例，以提高格式正确率以及实体识别的准确率

> > 这项目的管线看起来挺像RAG的，可以在提示词里面也加入GraphRAG一样的示例： https://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/entity_extraction.py > > 另外，有啥好用的本地LLM推荐的吗？能同时支持中日双语又足够“聪明”的开源模型可不好找。 > > 细分任务和流程确实可以提升最终效果但是同时也会大幅度增加时间与 Token 的消耗，也对模型的能力提出了更高的要求，所以是需要权衡的项目最近的改进方向正好相反即在保证效果的前提下，尽可能合并任务，来减少消耗，同时保证本地小模型也能有一个尚可的效果开发用的就是一键包里面的 Qwen2.5-7B 开发LightRAG的团队就有专门针对小模型的MiniRAG，提示词结构大差不差，效果一言难尽 ![Image](https://github.com/user-attachments/assets/13ec446c-288a-4a13-994c-46df43338033)

RuntimeError: "rshift_cuda" not implemented for 'Half'

when offloading model weights to gpu, vscode on win11 reports a warning: Some weights of the model checkpoint at [E:/AI/LLM/Models/Qwen2.5-VL-32B-Instruct-AWQ](file:///E:/AI/LLM/Models/Qwen2.5-VL-32B-Instruct-AWQ) were not used when initializing Qwen2_5_VLForConditionalGeneration: ['lm_head.weight'] - This IS...