rag-gpt icon indicating copy to clipboard operation
rag-gpt copied to clipboard

Optimize the strategy for extracting text from HTML webpages by removing unnecessary and distracting information

Open blmdxiao opened this issue 1 year ago • 1 comments

blmdxiao avatar May 27 '24 04:05 blmdxiao

Remove all the tags of ['nav', 'footer', 'aside', 'script', 'style'] that are not meaningful for the extraction #61

blmdxiao avatar May 27 '24 04:05 blmdxiao