bulk_extractor
bulk_extractor copied to clipboard
URL parse error when surrounded by '"'
Results incorrectly include trailing '"' when parsing URLs.
url.txt output:
199452984 http://www.icra.org/ratingsv02.html" (pics-1.1 "http://www.icra.org/ratingsv02.html" l gen true for
199453047 http://www.msn.com" true for "http://www.msn.com" r (cz 1 lz 1 n
199453120 http://msn.com" true for "http://msn.com" r (cz 1 lz 1 n
199453189 http://stb.msn.com" true for "http://stb.msn.com" r (cz 1 lz 1 n
199453396 http://www.rsac.org/ratingsv01.html" z 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true for
199453645 http://stc.msn.com" true for "http://stc.msn.com" r (n 0 s 0 v 0
199453709 http://stj.msn.com" true for "http://stj.msn.com" r (n 0 s 0 v 0
should be:
199452984 http://www.icra.org/ratingsv02.html (pics-1.1 "http://www.icra.org/ratingsv02.html" l gen true for
199453047 http://www.msn.com true for "http://www.msn.com" r (cz 1 lz 1 n
199453120 http://msn.com true for "http://msn.com" r (cz 1 lz 1 n
199453189 http://stb.msn.com true for "http://stb.msn.com" r (cz 1 lz 1 n
199453396 http://www.rsac.org/ratingsv01.html z 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true for
199453645 http://stc.msn.com true for "http://stc.msn.com" r (n 0 s 0 v 0
199453709 http://stj.msn.com true for "http://stj.msn.com" r (n 0 s 0 v 0
Version information:
# BULK_EXTRACTOR-Version: 1.5.5 ($Rev: 10844 $)
# Feature-Recorder: url
# Feature-File-Version: 1.1
Please let me know if I can provide you with any better information.
Can you upload a file that demonstrates the problem?