sec-parser icon indicating copy to clipboard operation
sec-parser copied to clipboard

Fix the TopSectionTitle being split in MSFT filing

Open Elijas opened this issue 1 year ago • 0 comments

Context

MSFT accuracy-test (permalink at the time of posting)

Problem

Titles come out as two separate title elements

        {
            "text_content": "PART I. FINANCI"
        },
        {
            "text_content": "AL INFORMATION"
        },

This is because MSFT puts the section titles into two pieces for some reason

Ideas about a possible solution

Maybe include the line information into the solution: If two elements of the same type (and level) are on the same line, they should probably be identified as a single element

Elijas avatar Dec 27 '23 11:12 Elijas