PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

Windows的环境下,pdf转换出错!paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true

Open xlnn opened this issue 2 years ago • 3 comments

(pytorch) D:\pythoncx\OCR-pdf>paddleocr --image_dir=c.pdf --type=structure --recovery=true --use_pdf2docx_api=true usage: paddleocr [-h] [--use_gpu USE_GPU] [--use_xpu USE_XPU] [--ir_optim IR_OPTIM] [--use_tensorrt USE_TENSORRT] [--min_subgraph_size MIN_SUBGRAPH_SIZE] [--shape_info_filename SHAPE_INFO_FILENAME] [--precision PRECISION] [--gpu_mem GPU_MEM] [--image_dir IMAGE_DIR] [--det_algorithm DET_ALGORITHM] [--det_model_dir DET_MODEL_DIR] [--det_limit_side_len DET_LIMIT_SIDE_LEN] [--det_limit_type DET_LIMIT_TYPE] [--det_db_thresh DET_DB_THRESH] [--det_db_box_thresh DET_DB_BOX_THRESH] [--det_db_unclip_ratio DET_DB_UNCLIP_RATIO] [--max_batch_size MAX_BATCH_SIZE] [--use_dilation USE_DILATION] [--det_db_score_mode DET_DB_SCORE_MODE] [--det_east_score_thresh DET_EAST_SCORE_THRESH] [--det_east_cover_thresh DET_EAST_COVER_THRESH] [--det_east_nms_thresh DET_EAST_NMS_THRESH] [--det_sast_score_thresh DET_SAST_SCORE_THRESH] [--det_sast_nms_thresh DET_SAST_NMS_THRESH] [--det_sast_polygon DET_SAST_POLYGON] [--det_pse_thresh DET_PSE_THRESH] [--det_pse_box_thresh DET_PSE_BOX_THRESH] [--det_pse_min_area DET_PSE_MIN_AREA] [--det_pse_box_type DET_PSE_BOX_TYPE] [--det_pse_scale DET_PSE_SCALE] [--scales SCALES] [--alpha ALPHA] [--beta BETA] [--fourier_degree FOURIER_DEGREE] [--det_fce_box_type DET_FCE_BOX_TYPE] [--rec_algorithm REC_ALGORITHM] [--rec_model_dir REC_MODEL_DIR] [--rec_image_shape REC_IMAGE_SHAPE] [--rec_batch_num REC_BATCH_NUM] [--max_text_length MAX_TEXT_LENGTH] [--rec_char_dict_path REC_CHAR_DICT_PATH] [--use_space_char USE_SPACE_CHAR] [--vis_font_path VIS_FONT_PATH] [--drop_score DROP_SCORE] [--e2e_algorithm E2E_ALGORITHM] [--e2e_model_dir E2E_MODEL_DIR] [--e2e_limit_side_len E2E_LIMIT_SIDE_LEN] [--e2e_limit_type E2E_LIMIT_TYPE] [--e2e_pgnet_score_thresh E2E_PGNET_SCORE_THRESH] [--e2e_char_dict_path E2E_CHAR_DICT_PATH] [--e2e_pgnet_valid_set E2E_PGNET_VALID_SET] [--e2e_pgnet_mode E2E_PGNET_MODE] [--use_angle_cls USE_ANGLE_CLS] [--cls_model_dir CLS_MODEL_DIR] [--cls_image_shape CLS_IMAGE_SHAPE] [--label_list LABEL_LIST] [--cls_batch_num CLS_BATCH_NUM] [--cls_thresh CLS_THRESH] [--enable_mkldnn ENABLE_MKLDNN] [--cpu_threads CPU_THREADS] [--use_pdserving USE_PDSERVING] [--warmup WARMUP] [--sr_model_dir SR_MODEL_DIR] [--sr_image_shape SR_IMAGE_SHAPE] [--sr_batch_num SR_BATCH_NUM] [--draw_img_save_dir DRAW_IMG_SAVE_DIR] [--save_crop_res SAVE_CROP_RES] [--crop_res_save_dir CROP_RES_SAVE_DIR] [--use_mp USE_MP] [--total_process_num TOTAL_PROCESS_NUM] [--process_id PROCESS_ID] [--benchmark BENCHMARK] [--save_log_path SAVE_LOG_PATH] [--show_log SHOW_LOG] [--use_onnx USE_ONNX] [--output OUTPUT] [--table_max_len TABLE_MAX_LEN] [--table_algorithm TABLE_ALGORITHM] [--table_model_dir TABLE_MODEL_DIR] [--merge_no_span_structure MERGE_NO_SPAN_STRUCTURE] [--table_char_dict_path TABLE_CHAR_DICT_PATH] [--layout_model_dir LAYOUT_MODEL_DIR] [--layout_dict_path LAYOUT_DICT_PATH] [--layout_score_threshold LAYOUT_SCORE_THRESHOLD] [--layout_nms_threshold LAYOUT_NMS_THRESHOLD] [--kie_algorithm KIE_ALGORITHM] [--ser_model_dir SER_MODEL_DIR] [--ser_dict_path SER_DICT_PATH] [--ocr_order_method OCR_ORDER_METHOD] [--mode MODE] [--image_orientation IMAGE_ORIENTATION] [--layout LAYOUT] [--table TABLE] [--ocr OCR] [--recovery RECOVERY] [--save_pdf SAVE_PDF] [--lang LANG] [--det DET] [--rec REC] [--type TYPE] [--ocr_version {PP-OCR,PP-OCRv2,PP-OCRv3}] [--structure_version {PP-Structure,PP-Structurev2}] paddleocr: error: unrecognized arguments: --use_pdf2docx_api=true

xlnn avatar Nov 15 '22 11:11 xlnn

目前paddleocr是不是低版本,可以安装paddleocr==2.6.0.3试试,如果还报错可以试试2.6.0.2

an1018 avatar Nov 15 '22 12:11 an1018

好像还是不对,找不到版本??

(pytorch) D:\pythoncx\OCR-pdf>pip3 install "paddleocr==2.6.0.3" -i https://mirror.baidu.com/pypi/simple
Looking in indexes: https://mirror.baidu.com/pypi/simple
ERROR: Could not find a version that satisfies the requirement paddleocr==2.6.0.3 (from versions: 0.0.1.1, 0.0.2, 0.0.3, 0.0.3.1, 1.0.0, 1.0.1, 1.1.1, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6, 2.2, 2.2.0.1, 2.2.0.2, 2.3, 2.3.0.1, 2.3.0.2, 2.4, 2.4.0.1, 2.4.0.2, 2.4.0.3, 2.4.0.4, 2.5, 2.5.0.2, 2.5.0.3, 2.6, 2.6.0.1)
ERROR: No matching distribution found for paddleocr==2.6.0.3

(pytorch) D:\pythoncx\OCR-pdf>pip3 install "paddleocr==2.6.0.2" -i https://mirror.baidu.com/pypi/simple
Looking in indexes: https://mirror.baidu.com/pypi/simple
ERROR: Could not find a version that satisfies the requirement paddleocr==2.6.0.2 (from versions: 0.0.1.1, 0.0.2, 0.0.3, 0.0.3.1, 1.0.0, 1.0.1, 1.1.1, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6, 2.2, 2.2.0.1, 2.2.0.2, 2.3, 2.3.0.1, 2.3.0.2, 2.4, 2.4.0.1, 2.4.0.2, 2.4.0.3, 2.4.0.4, 2.5, 2.5.0.2, 2.5.0.3, 2.6, 2.6.0.1)
ERROR: No matching distribution found for paddleocr==2.6.0.2

xlnn avatar Nov 15 '22 12:11 xlnn

image

xlnn avatar Nov 16 '22 13:11 xlnn

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jul 08 '23 02:07 github-actions[bot]