bulk_extractor icon indicating copy to clipboard operation
bulk_extractor copied to clipboard

scan_rar does not find 3 out of 4 JPEGS in tests/Data/jpegs.rar

Open simsong opened this issue 4 years ago • 1 comments

  • [x] Validate whether BE1.6 has this problem.
  • [ ] Add more print statements to the 2.0 decoder to see why the components aren't being extracted.
  • [ ] Compare to other open source RAR implementations, including libarchive.

The UnArchiver on my Mac finds 4 jpegs in tests/Data/jpegs.rar:

image

(base) simsong@nimi src % ls -l /Users/simsong/gits/bulk_extractor/src/jpegs                                                            (slg-dev)bulk_extractor
total 32
-rw-r--r--@ 1 simsong  staff  7323 Jul 18  2014 1.jpg
-rw-r--r--@ 1 simsong  staff  7331 Jul 18  2014 2.jpg
-rw-r--r--@ 1 simsong  staff  7509 Jul 18  2014 3.jpg
-rw-r--r--@ 1 simsong  staff  7599 Jul 18  2014 4.jpg
(base) simsong@nimi src %                                                                                                               (slg-dev)bulk_extractor

But both version 1.6 and version 2.0 alpha of BE only find 1:

(base) simsong@nimi src % ls -l out-jpegs_be16-rar/jpeg_carved/000                                                                      (slg-dev)bulk_extractor
total 8
-rw-r--r--  1 simsong  staff  7599 Aug  2 22:15 13259-RAR-0.jpg

image

At least it got the size right!

I don't have time to fix this, and scan_rar is incredibly slow with my rewrite (lesson: don't use .slice) I'll fix up the scanner so it's faster, but somebody else needs to find the other three JPEGs. This would be a good student project.

simsong avatar Aug 03 '21 02:08 simsong

Confirmed, BE16 has the same problem:

(base) simsong@nimi src % ls -l out-be16-rar/jpeg_carved/000                                                                                  (slg-dev)bulk_extractor
total 8
-rw-r--r--  1 simsong  staff  7599 Aug 23 06:00 13259-RAR-0.jpg

simsong avatar Aug 23 '21 10:08 simsong