plane icon indicating copy to clipboard operation
plane copied to clipboard

pagination removal and local indexDB for issues

Open SatishGandham opened this issue 1 year ago โ€ข 1 comments

SatishGandham avatar Aug 13 '24 06:08 SatishGandham

(MinerU) llw@lianyan:~/github/marker/pdf_marker/workspace2/pdf$ magic-pdf -p small_ocr.pdf 2024-08-14 08:33:14.220 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 8, cid_chars_radio: 0.0 2024-08-14 08:33:14.221 | WARNING | magic_pdf.filter.pdf_classify_by_type:classify:334 - pdf is not classified by area and text_len, by_image_area: False, by_text: False, by_avg_words: False, by_img_num: True, by_text_layout: False, by_img_narrow_strips: False, by_invalid_chars: True 2024-08-14 08:33:14.240 | ERROR | magic_pdf.model.pdf_extract_kit::27 - libnccl.so.2: cannot open shared object file: No such file or directory Traceback (most recent call last):

File "/home/llw/miniconda3/envs/MinerU/bin/magic-pdf", line 8, in sys.exit(cli()) โ”‚ โ”‚ โ”” <Command cli> โ”‚ โ”” โ”” <module 'sys' (built-in)> File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) โ”‚ โ”‚ โ”‚ โ”” {} โ”‚ โ”‚ โ”” () โ”‚ โ”” <function BaseCommand.main at 0x7d6fd1a1a200> โ”” <Command cli> File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) โ”‚ โ”‚ โ”” <click.core.Context object at 0x7d6fd1d56d40> โ”‚ โ”” <function Command.invoke at 0x7d6fd1a1acb0> โ”” <Command cli> File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”” {'path': 'small_ocr.pdf', 'output_dir': '', 'method': 'auto'} โ”‚ โ”‚ โ”‚ โ”‚ โ”” <click.core.Context object at 0x7d6fd1d56d40> โ”‚ โ”‚ โ”‚ โ”” <function cli at 0x7d6f5f338700> โ”‚ โ”‚ โ”” <Command cli> โ”‚ โ”” <function Context.invoke at 0x7d6fd1a19a20> โ”” <click.core.Context object at 0x7d6fd1d56d40> File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) โ”‚ โ”” {'path': 'small_ocr.pdf', 'output_dir': '', 'method': 'auto'} โ”” () File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/magic_pdf/tools/cli.py", line 75, in cli parse_doc(path) โ”‚ โ”” 'small_ocr.pdf' โ”” <function cli..parse_doc at 0x7d6fd1c33490> File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/magic_pdf/tools/cli.py", line 60, in parse_doc do_parse( โ”” <function do_parse at 0x7d6f5f323be0> File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/magic_pdf/tools/common.py", line 65, in do_parse pipe.pipe_analyze() โ”‚ โ”” <function UNIPipe.pipe_analyze at 0x7d6f5f323880> โ”” <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7d6f5f328d00> File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/magic_pdf/pipe/UNIPipe.py", line 31, in pipe_analyze self.model_list = doc_analyze(self.pdf_bytes, ocr=True) โ”‚ โ”‚ โ”‚ โ”‚ โ”” b'%PDF-1.7\r\n%\xa1\xb3\xc5\xd7\r\n1 0 obj\r\n<</Pages 2 0 R /Type/Catalog>>\r\nendobj\r\n2 0 obj\r\n<</Count 8/Kids[ 4 0 R ... โ”‚ โ”‚ โ”‚ โ”” <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7d6f5f328d00> โ”‚ โ”‚ โ”” <function doc_analyze at 0x7d6fcc9d68c0> โ”‚ โ”” [] โ”” <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7d6f5f328d00> File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 109, in doc_analyze custom_model = model_manager.get_model(ocr, show_log) โ”‚ โ”‚ โ”‚ โ”” False โ”‚ โ”‚ โ”” True โ”‚ โ”” <function ModelSingleton.get_model at 0x7d6fcc9d6830> โ”” <magic_pdf.model.doc_analyze_by_custom_model.ModelSingleton object at 0x7d6f5ee28460> File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 63, in get_model self._models[key] = custom_model_init(ocr=ocr, show_log=show_log) โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”” False โ”‚ โ”‚ โ”‚ โ”‚ โ”” True โ”‚ โ”‚ โ”‚ โ”” <function custom_model_init at 0x7d6fcc9d6710> โ”‚ โ”‚ โ”” (True, False) โ”‚ โ”” {} โ”” <magic_pdf.model.doc_analyze_by_custom_model.ModelSingleton object at 0x7d6f5ee28460> File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 83, in custom_model_init from magic_pdf.model.pdf_extract_kit import CustomPEKModel File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed

File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 13, in import torch File "/home/llw/miniconda3/envs/MinerU/lib/python3.10/site-packages/torch/init.py", line 239, in from torch._C import * # noqa: F403

ImportError: libnccl.so.2: cannot open shared object file: No such file or directory 2024-08-14 08:33:14.246 | ERROR | magic_pdf.model.pdf_extract_kit::28 - Required dependency not installed, please install by "pip install magic-pdf[full] --extra-index-url https://myhloli.github.io/wheels/"

ๅŒๆ ท็š„ๆญฅ้ชคๅŒๆ ท็š„็Žฏๅขƒๆˆ‘ไนŸๆ˜ฏๆŠฅ่ฟ™ไธช้”™่ฏฏ

lianyant avatar Aug 14 '24 00:08 lianyant

@hzzheng0612 @lianyant Have you installed NCCL? https://developer.nvidia.com/nccl

myhloli avatar Aug 14 '24 03:08 myhloli

Centos7 python3.10 CPUๆจกๅผไธ‹ไนŸๆ˜ฏ่ฟ™ไธชๆŠฅ้”™๏ผŒ็‰ˆๆœฌไธบ0.7.0b1

luxinfeng avatar Aug 14 '24 15:08 luxinfeng

Centos7 python3.10 CPUๆจกๅผไธ‹ไนŸๆ˜ฏ่ฟ™ไธชๆŠฅ้”™๏ผŒ็‰ˆๆœฌไธบ0.7.0b1

ๆˆ‘่ฟ™่พนๆŽ’ๆŸฅๅŽๅ‘็Žฐๆ˜ฏ็ผบๅคฑOpenGL่ฟ™ๅ‡ ไธชๅบ“ๅฏผ่‡ด็š„๏ผŒ้€š่ฟ‡yum -y install epel-release
&& yum -y install mesa-libGL mesa-libGLU libXtst libXrender ่กฅๅ……ไธŠ่ฟ™ๅ‡ ไธชไพ่ต–ๅŽๅฐฑๅฏไปฅๆญฃๅธธ่ฟ่กŒไบ†

luxinfeng avatar Aug 15 '24 02:08 luxinfeng

@hzzheng0612 @lianyant Have you installed NCCL? https://developer.nvidia.com/nccl

ๆˆ‘ๅฎ‰่ฃ…ไบ†่ฟ™ไธชไน‹ๅŽๅฐฑๆญฃๅธธไบ†๏ผŒ้žๅธธๆ„Ÿ่ฐข @myhloli

lianyant avatar Aug 15 '24 09:08 lianyant