unblob Some directories are not reported

When extracting chunks, there is a logic for handling the whole chunks differently, here. This results that in some cases some directories are not reported.

Reproduce this with this test file: test.zip. This is actually from the integration test suit, but I had to zip it for github to allow me attach it.

If I run this file with unblob and check the report, I get the following item:

A part of the generated report json

{
  "task": {
    "path": "/tmp/fruits.lvl1.lzh",
    "depth": 0,
    "chunk_id": "",
    "__typename__": "Task"
  },
  "reports": [
    {
      "path": "/tmp/fruits.lvl1.lzh",
      "size": 146,
      "is_dir": false,
      "is_file": true,
      "is_link": false,
      "link_target": null,
      "__typename__": "StatReport"
    },
    {
      "magic": "  LHarc 1.x/ARX archive data  [lh0], 0x0 OS, with \"apple.txt\"\\012- data",
      "mime_type": "application/x-lzh-compressed",
      "__typename__": "FileMagicReport"
    },
    {
      "md5": "cf71709694cd2f3e98fcf87524194beb",
      "sha1": "701248bfd7dd7a7360ce237754a82425d1d13346",
      "sha256": "e016f42094b088058e7fa5d9c3f98bafaeac87899205192d95b8001f72058a0f",
      "__typename__": "HashReport"
    },
    {
      "chunk_id": "47941:3",
      "handler_name": "lzh",
      "start_offset": 96,
      "end_offset": 146,
      "size": 50,
      "is_encrypted": false,
      "extraction_reports": [],
      "__typename__": "ChunkReport"
    },
    {
      "chunk_id": "47941:2",
      "handler_name": "lzh",
      "start_offset": 47,
      "end_offset": 96,
      "size": 49,
      "is_encrypted": false,
      "extraction_reports": [],
      "__typename__": "ChunkReport"
    },
    {
      "chunk_id": "47941:1",
      "handler_name": "lzh",
      "start_offset": 0,
      "end_offset": 47,
      "size": 47,
      "is_encrypted": false,
      "extraction_reports": [],
      "__typename__": "ChunkReport"
    }
  ],
  "subtasks": [
    {
      "path": "/tmp/unblob/fruits.lvl1.lzh_extract/96-146.lzh_extract",
      "depth": 1,
      "chunk_id": "47941:3",
      "__typename__": "Task"
    },
    {
      "path": "/tmp/unblob/fruits.lvl1.lzh_extract/47-96.lzh_extract",
      "depth": 1,
      "chunk_id": "47941:2",
      "__typename__": "Task"
    },
    {
      "path": "/tmp/unblob/fruits.lvl1.lzh_extract/0-47.lzh_extract",
      "depth": 1,
      "chunk_id": "47941:1",
      "__typename__": "Task"
    }
  ],
  "__typename__": "TaskResult"
}

This means, when unblob handles /tmp/fruits.lvl1.lzh, it will create 3 subtasks:

/tmp/unblob/fruits.lvl1.lzh_extract/96-146.lzh_extract
/tmp/unblob/fruits.lvl1.lzh_extract/47-96.lzh_extract
/tmp/unblob/fruits.lvl1.lzh_extract/0-47.lzh_extract

And will continue to run for those (sub)tasks. However a task for the /tmp/unblob/fruits.lvl1.lzh_extract directory is never created, so that directory is just there in the file system without actually being in the generated report.

Apr 07 '23 12:04 kukovecz

The directory not being reported/processed as a Task is an auxiliary directory, that is used only to carve chunks to, we did not assign any report to it, yet, because it was not necessary so far.

If it is really needed a new report type on chunks (CarveReport?) could resolve this.

Apr 07 '23 13:04 e3krisztian

Related: #326.

I am not sure we need to do anything with it, though.

Apr 26 '23 15:04 e3krisztian

Option could be to move the carved files out of the extraction tree structure and store them separately. Also in most cases we are deleting the carves, also carves are easily reproducable.

This way we can use the followning extraction tree structure:

/tmp/unblob/fruits.lvl1.lzh_96-146_extract/
/tmp/unblob/fruits.lvl1.lzh_47-96_extract/
/tmp/unblob/fruits.lvl1.lzh_0-47_extract/

May 08 '23 08:05 martonilles

This issue is causing problems with people wanting to do nice things with the unblob API from Python. See https://github.com/onekey-sec/unblob/issues/878

Jun 17 '24 07:06 qkaiser

This was blocking my ability to map between extraction directories and the blobs they were derived from with the API so I took a stab at it in #891. I didn't figure out how to add a new task/subtask for carving, instead I just added a new report type that logs the source and destination of each carve.

With the example fruits.lvl1 file I the following new outputs are produced in the log which allows a consumer of the log to map between the fruits.lvl1.lzh file and the 3 carved files: fruits.lvl1.lzh_extract/96-146.lzh, fruits.lvl1.lzh_extract/47-96.lzh, and fruits.lvl1.lzh_extract/0-47.lzh.

       {
        "carved_from": "/tmp/unblob/fruits.lvl1.lzh",
        "carved_to": "/tmp/unblob/fruits.lvl1.lzh_extract/96-146.lzh",
        "start_offset": 96,
        "end_offset": 146,
        "handler_name": "lzh",
        "__typename__": "CarveReport"
      },
      {
        "carved_from": "/tmp/unblob/fruits.lvl1.lzh",
        "carved_to": "/tmp/unblob/fruits.lvl1.lzh_extract/47-96.lzh",
        "start_offset": 47,
        "end_offset": 96,
        "handler_name": "lzh",
        "__typename__": "CarveReport"
      },
      {
        "carved_from": "/tmp/unblob/fruits.lvl1.lzh",
        "carved_to": "/tmp/unblob/fruits.lvl1.lzh_extract/0-47.lzh",
        "start_offset": 0,
        "end_offset": 47,
        "handler_name": "lzh",
        "__typename__": "CarveReport"
      },

Jul 02 '24 16:07 AndrewFasano

Fixed by https://github.com/onekey-sec/unblob/pull/1017

Dec 04 '24 08:12 qkaiser

unblob unblob copied to clipboard

Some directories are not reported

unblob
unblob copied to clipboard