sec-parser
sec-parser copied to clipboard
Make HighlightedTextClassifier work with `<b>` tags
Discussed in https://github.com/orgs/alphanome-ai/discussions/56
Originally posted by Elijas November 24, 2023
Example document
https://www.sec.gov/Archives/edgar/data/1675149/000119312518236766/d828236d10q.htm
<p style="margin-top:9pt; margin-bottom:0pt; text-indent:4%; font-size:10pt; font-family:Times New Roman">
Options to purchase 1 million shares of common stock at a weighted average exercise price of $36.28 were
outstanding as of June 30, 2017, but were not included in the computation of diluted EPS because they were anti-dilutive, as the exercise prices of the options were greater than the average market price of Alcoa Corporations common stock.
</p>
<p style="margin-top:13pt; margin-bottom:0pt; font-size:10pt; font-family:Times New Roman">
<b>
G. Accumulated Other Comprehensive Loss
</b>
</p>
<p style="margin-top:6pt; margin-bottom:0pt; text-indent:4%; font-size:10pt; font-family:Times New Roman">
The following table details the activity of the three components that comprise Accumulated other comprehensive loss for both Alcoa
Corporations shareholders and Noncontrolling interest:
</p>
Goal
The "G. Accumulated Other Comprehensive Loss" should be recognized as HighlightedTextElement (and therefore, TitleElement).
Most likely, you will have to get a percentage of text that is covered inside the <b>
tag, by reusing the parts implemented in the HighlightedTextElement. This will help you avoid situations where text text text <b>bold</b> text text
is recognized as higlighted