在识别pdf中发现存在两个问题,
1 无法在docx文件中还原 pdf文件中的隐藏表格的一部分显示线段, 比如样本中的红线是一个表格的一条框线。
2 文字段落无法实现首行缩进
样本如下图:
zf1.pdf

一样的问题,转docx的时候横线转不成功,还报错这个:
[INFO] Start to convert D:/Download/aab.pdf
[INFO] [1/4] Opening document...
[INFO] [2/4] Analyzing document...
[WARNING] Ignore Line "𝑘𝐿\udc40" due to overlap
[WARNING] Ignore Line "𝑘" due to overlap
[INFO] [3/4] Parsing pages...
[INFO] (1/18) Page 1
[INFO] (2/18) Page 2
[INFO] (3/18) Page 3
[INFO] (4/18) Page 4
[INFO] (5/18) Page 5
[INFO] (6/18) Page 6
[INFO] (7/18) Page 7
[INFO] (8/18) Page 8
[INFO] (9/18) Page 9
[INFO] (10/18) Page 10
[INFO] (11/18) Page 11
[INFO] (12/18) Page 12
[INFO] (13/18) Page 13
[INFO] (14/18) Page 14
[ERROR] Ignore page 14 due to parsing page error: 'utf-8' codec can't encode character '\udc54' in position 0: surrogates not allowed
[INFO] (15/18) Page 15
[ERROR] Ignore page 15 due to parsing page error: 'utf-8' codec can't encode character '\udc59' in position 0: surrogates not allowed
[INFO] (16/18) Page 16
[INFO] (17/18) Page 17
[INFO] (18/18) Page 18
[INFO] [4/4] Creating pages...
[INFO] (1/16) Page 1
[INFO] (2/16) Page 2
[INFO] (3/16) Page 3
[INFO] (4/16) Page 4
[INFO] (5/16) Page 5
[INFO] (6/16) Page 6
[ERROR] Ignore page 6 due to making page error: 'utf-8' codec can't encode character '\udc40' in position 2: surrogates not allowed
[INFO] (7/16) Page 7
[INFO] (8/16) Page 8
[INFO] (9/16) Page 9
[INFO] (10/16) Page 10
[INFO] (11/16) Page 11
[INFO] (12/16) Page 12
[INFO] (13/16) Page 13
[INFO] (14/16) Page 16
[INFO] (15/16) Page 17
[INFO] (16/16) Page 18
[INFO] Terminated in 1.70s.
File Converted Successfully
[aab.pdf](https://github.com/ArtifexSoftware/pdf2docx/files/15048562/aab.pdf)