node-unfluff icon indicating copy to clipboard operation
node-unfluff copied to clipboard

Fixed side effect from invocation of cleaner in unfluff.lazy

Open franza opened this issue 10 years ago • 3 comments

I was sure that I checked that for #16 but it seems that I missed that.

cleaner mutates original doc object so doc needs to be re-calculated. So right now after cleaner is applied we will suffer from side effect. Consider next example:

[fs, unfluff] = ['fs', 'unfluff'].map require

html = fs.readFileSync('test_tags_kexp.html', 'utf8')

doc1 = unfluff.lazy html
doc2 = unfluff.lazy html

console.log 'tags1: ', doc1.tags() # ['Dennis Morton', 'film', 'kusp film review', 'Stand Up Guys']
console.log 'text1: ', doc1.text()

console.log 'text2: ', doc2.text()
console.log 'tags2: ', doc2.tags() # [ ]

Using this code over test_tags_kexp.html fixture we will have different results for tags() since cleaner is called inside text(). So when cleaner is called we need to reload document. Besides, I added some refactoring.

franza avatar Aug 29 '14 19:08 franza

Thanks for catching this! I'll take a look in detail when I have some time this weekend.

ageitgey avatar Aug 29 '14 20:08 ageitgey

Sure. If you have ideas how we can avoid reloading document bring it up.

franza avatar Aug 29 '14 20:08 franza

Sorry, I've been lax on reviewing this. Still plan to get to this very soon. Thanks!

ageitgey avatar Sep 08 '14 16:09 ageitgey