paper-qa
paper-qa copied to clipboard
html/xml tags
Do you strip html tags from the documents before embedding them into vectors?
And if so do you support this stripping of xml tags also?
Hi @pedrocr83 we use html2text to parse the html into text before embedding -- I think the library works with XML tags as well!