paper-qa icon indicating copy to clipboard operation
paper-qa copied to clipboard

html/xml tags

Open pedrocr83 opened this issue 1 year ago • 1 comments

Do you strip html tags from the documents before embedding them into vectors?

And if so do you support this stripping of xml tags also?

pedrocr83 avatar Mar 21 '24 16:03 pedrocr83

Hi @pedrocr83 we use html2text to parse the html into text before embedding -- I think the library works with XML tags as well!

mskarlin avatar Sep 11 '24 17:09 mskarlin