RoadToNowhereX
RoadToNowhereX
The point is, ctd will mix different sentences from different bubbles usually, but [ogkalu/comic-text-and-bubble-detector](https://huggingface.co/ogkalu/comic-text-and-bubble-detector) hardly does so.
Sample code from [PekingU/rtdetr_r50vd](https://huggingface.co/PekingU/rtdetr_r50vd): ```python import torch import requests from PIL import Image from transformers import RTDetrForObjectDetection, RTDetrImageProcessor url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) image_processor = RTDetrImageProcessor.from_pretrained("ogkalu/comic-text-and-bubble-detector") model =...
> SHANA.bandicam.2025-04-22.21-22-38-222.mp4 > "I found the original comic by using image search. I think the margin of error is within an acceptable range." 用你提供的链接试了一下,确实比原来的ctd强上不少,但是漏字和混淆不同文本框的问题还是时不时就有
> SHANA.bandicam.2025-04-22.21-22-38-222.mp4 > "I found the original comic by using image search. I think the margin of error is within an acceptable range." 兄弟你要是还有心力的话可以试着练一下这个[PekingU/rtdetr_v2_r50vd](https://huggingface.co/PekingU/rtdetr_v2_r50vd),或者paddle的PP-OCRv4_server_det;rtdet的预训练模型我没试过,paddle的预训练模型本身就非常强,直接就能拿来用,但是训练和推理的框架比较麻烦一点
> > > SHANA.bandicam.2025-04-22.21-22-38-222._mp4_ > > > "I found the original comic by using image search. I think the margin of error is within an acceptable range." > > >...
> Well, after the tests. > > I can use it, but I probably won't add it. Why? Because this detector detects blobs and the text in the blobs, and...
> > 这项目的管线看起来挺像RAG的,可以在提示词里面也加入GraphRAG一样的示例: https://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/entity_extraction.py > > 另外,有啥好用的本地LLM推荐的吗?能同时支持中日双语又足够“聪明”的开源模型可不好找。 > > 细分任务和流程确实可以提升最终效果 但是同时也会大幅度增加 时间 与 Token 的消耗,也对模型的能力提出了更高的要求,所以是需要权衡的 项目最近的改进方向正好相反 即在保证效果的前提下,尽可能合并任务,来减少消耗,同时保证本地小模型也能有一个尚可的效果 开发用的就是一键包里面的 Qwen2.5-7B 开发LightRAG的团队就有专门针对小模型的MiniRAG,提示词结构大差不差,效果一言难尽 
when offloading model weights to gpu, vscode on win11 reports a warning: Some weights of the model checkpoint at [E:/AI/LLM/Models/Qwen2.5-VL-32B-Instruct-AWQ](file:///E:/AI/LLM/Models/Qwen2.5-VL-32B-Instruct-AWQ) were not used when initializing Qwen2_5_VLForConditionalGeneration: ['lm_head.weight'] - This IS...