PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

2.8.0~2.9.0显存占用大大超出2.7.3

Open myhloli opened this issue 4 months ago • 0 comments

🔎 Search before asking

  • [X] I have searched the PaddleOCR Docs and found no similar bug report.
  • [X] I have searched the PaddleOCR Issues and found no similar bug report.
  • [X] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

今天更新了2.9.0版本测试,发现rec过程显存占用比2.7.3超出几倍不止,直接爆了3060ti的8g显存 回测了之前的版本发现2.8.0,2.8.1存在同样的问题

看了下默认的参数rec_batch_num=6,

在2.7.3版本改动这个值不会影响显存占用,可能是batch不生效?峰值占用约1.7G,改成1之后也是这个占用,改成10也一样

在2.8.0/2.8.1/2.9.0表现一致,占用7G左右。

将rec_batch_num改成1,在2.8.0/2.8.1/2.9.0占用可以降到2.6G,远超出2.7.3的1.7G占用

rec模型都是一样的ch_PP-OCRv4_rec_infer,只是改变paddleocr的版本,就能测试出明显的显存占用差异 框架是使用的同样的paddlepaddle-gpu 2.6.1版本,希望可以排查下2.8+在单个rec batch多占用的900m显存的原因

🏃‍♂️ Environment (运行环境)

windows 11 paddlepaddle-gpu 2.6.1 cuda 11.8 gpu 3060ti vram 8g paddleocr 2.7.3/2.8.0/2.8.1/2.9.0

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

import os
import time
from paddleocr import PaddleOCR

ocr = PaddleOCR(rec_batch_num=1)

def extract_text_from_images(directory):
    results = {}
    # 遍历目录中的所有文件
    for filename in os.listdir(directory):
        if filename.endswith(".jpg"):
            filepath = os.path.join(directory, filename)
            # 读取图像并进行OCR识别
            result = ocr.ocr(filepath, cls=True)
            # 提取识别结果中的文本
            text = [line[1][0] for line in result[0]]
            results[filename] = ' '.join(text)
    return results

# 使用函数
directory_path = "E:\pdf_meta\demo\ocr_test"  #  目录里只需要放几张文本量不大的书籍页面截图即可复现
start_time = time.time()
texts = extract_text_from_images(directory_path)
for img_name, text in texts.items():
    print(f"Image: {img_name}, Text: {text}")

end_time = time.time()

print(f"Total time taken: {end_time - start_time} seconds")

myhloli avatar Oct 21 '24 06:10 myhloli