DAVAR-Lab-OCR 生成的表格

生成的表格

Open cqray1990 opened this issue 10 months ago • 0 comments

", "<td", " colspan="2"", ">", "", "<td", " colspan="2"", ">", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "<td", " rowspan="2"", ">", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "<td", " rowspan="2"", ">", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]

我生成的表格是没有thead 和tbody 符号，这个符号一定需要？导致： def get_headbody(html_str): """Calculating number of bboxes belonging to "t-head" and "t-body" respectively

Args:
    html_str(str): html representing table structure

Returns:
    int: number of bboxes belonging to "t-head"
    int: number of bboxes belonging to "t-body"
"""
# html_code = ''.join(html_str)
# html_str = list('''<html><body><table>%s</table></body></html>''' % html_code)

s_h, e_h = html_str.index('<thead>'), html_str.index('</thead>')
s_b, e_b = html_str.index('<tbody>'), html_str.index('</tbody>')
num_h = html_str[s_h + 1:e_h].count('</td>')
num_b = html_str[s_b + 1:e_b].count('</td>')
return num_h, num_b

这个函数转换失败

Sep 08 '23 10:09 cqray1990

DAVAR-Lab-OCR DAVAR-Lab-OCR copied to clipboard

生成的表格

DAVAR-Lab-OCR
DAVAR-Lab-OCR copied to clipboard