document_cluster BeautifulSoup used for no reason?

BeautifulSoup used for no reason?

Open JesseAldridge opened this issue 9 years ago • 0 comments

synopses_wiki = open('synopses_list_wiki.txt').read().split('\n BREAKS HERE')
synopses_wiki = synopses_wiki[:100]

synopses_clean_wiki = []
for text in synopses_wiki:
    text = BeautifulSoup(text, 'html.parser').getText()
    #strips html formatting and converts to unicode
    synopses_clean_wiki.append(text)

synopses_wiki = synopses_clean_wiki

It seems the html has already been stripped in synopses_list_wiki.txt, so running the text through BeautifulSoup is pointless? I mention it because BeautifulSoup seems to be slowing things down significantly.

Apr 11 '16 05:04 JesseAldridge

document_cluster document_cluster copied to clipboard

BeautifulSoup used for no reason?

document_cluster
document_cluster copied to clipboard