bulk_extractor
bulk_extractor copied to clipboard
scan_rar does not find 3 out of 4 JPEGS in tests/Data/jpegs.rar
- [x] Validate whether BE1.6 has this problem.
- [ ] Add more print statements to the 2.0 decoder to see why the components aren't being extracted.
- [ ] Compare to other open source RAR implementations, including libarchive.
The UnArchiver on my Mac finds 4 jpegs in tests/Data/jpegs.rar:

(base) simsong@nimi src % ls -l /Users/simsong/gits/bulk_extractor/src/jpegs (slg-dev)bulk_extractor
total 32
-rw-r--r--@ 1 simsong staff 7323 Jul 18 2014 1.jpg
-rw-r--r--@ 1 simsong staff 7331 Jul 18 2014 2.jpg
-rw-r--r--@ 1 simsong staff 7509 Jul 18 2014 3.jpg
-rw-r--r--@ 1 simsong staff 7599 Jul 18 2014 4.jpg
(base) simsong@nimi src % (slg-dev)bulk_extractor
But both version 1.6 and version 2.0 alpha of BE only find 1:
(base) simsong@nimi src % ls -l out-jpegs_be16-rar/jpeg_carved/000 (slg-dev)bulk_extractor
total 8
-rw-r--r-- 1 simsong staff 7599 Aug 2 22:15 13259-RAR-0.jpg

At least it got the size right!
I don't have time to fix this, and scan_rar is incredibly slow with my rewrite (lesson: don't use .slice) I'll fix up the scanner so it's faster, but somebody else needs to find the other three JPEGs. This would be a good student project.
Confirmed, BE16 has the same problem:
(base) simsong@nimi src % ls -l out-be16-rar/jpeg_carved/000 (slg-dev)bulk_extractor
total 8
-rw-r--r-- 1 simsong staff 7599 Aug 23 06:00 13259-RAR-0.jpg