unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

convert tail to text/emphasised tag and process all emphasised descendants

Open stdweird opened this issue 1 year ago • 3 comments

cfr discussion at #2362

stdweird avatar Jan 31 '24 11:01 stdweird

@stdweird , thanks for the contribution! Do you have an html doc handy that this PR fixes, which could get added to unittests?

cragwolfe avatar Feb 26 '24 07:02 cragwolfe

@cragwolfe not sure if you want real data or not, but eg

<html>
  <body>
    <div>a
      <ul>b
        <li>c1</li>d1
        <li>c2</li>d2
      </ul>e
     </div>f<br>g
  </body>
</html>

the main intend should be to keep the result as close as possible to the orignial text (eg efg after the list items) , but right now retrieving all text is higher priority (at least for me). i don't think even this PR does that (g is still lost i think), but it is already an improvement.

stdweird avatar Feb 26 '24 08:02 stdweird

@stdweird - Are you still working on this?

MthwRobinson avatar May 15 '24 14:05 MthwRobinson