bulk_extractor icon indicating copy to clipboard operation
bulk_extractor copied to clipboard

URL parse error when surrounded by '"'

Open rfarley3 opened this issue 9 years ago • 1 comments

Results incorrectly include trailing '&quot' when parsing URLs.

url.txt output:

199452984   http://www.icra.org/ratingsv02.html"   (pics-1.1 "http://www.icra.org/ratingsv02.html" l gen true for 
199453047   http://www.msn.com&quot  true for "http://www.msn.com" r (cz 1 lz 1 n
199453120   http://msn.com&quot  true for "http://msn.com" r (cz 1 lz 1 n
199453189   http://stb.msn.com&quot  true for "http://stb.msn.com" r (cz 1 lz 1 n
199453396   http://www.rsac.org/ratingsv01.html"   z 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true for 
199453645   http://stc.msn.com&quot  true for "http://stc.msn.com" r (n 0 s 0 v 0
199453709   http://stj.msn.com&quot  true for "http://stj.msn.com" r (n 0 s 0 v 0

should be:

199452984   http://www.icra.org/ratingsv02.html (pics-1.1 "http://www.icra.org/ratingsv02.html" l gen true for 
199453047   http://www.msn.com   true for "http://www.msn.com" r (cz 1 lz 1 n
199453120   http://msn.com   true for "http://msn.com" r (cz 1 lz 1 n
199453189   http://stb.msn.com   true for "http://stb.msn.com" r (cz 1 lz 1 n
199453396   http://www.rsac.org/ratingsv01.html z 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true for 
199453645   http://stc.msn.com   true for "http://stc.msn.com" r (n 0 s 0 v 0
199453709   http://stj.msn.com   true for "http://stj.msn.com" r (n 0 s 0 v 0

Version information:

# BULK_EXTRACTOR-Version: 1.5.5 ($Rev: 10844 $)
# Feature-Recorder: url
# Feature-File-Version: 1.1

Please let me know if I can provide you with any better information.

rfarley3 avatar Mar 13 '15 21:03 rfarley3

Can you upload a file that demonstrates the problem?

simsong avatar Mar 24 '15 22:03 simsong