dissect.target icon indicating copy to clipboard operation
dissect.target copied to clipboard

Added support for voidtools everything DB

Open cobyge opened this issue 1 year ago • 10 comments

Inspired by #505, I remembered I had some code lying around to parse the database of Voidtools Everything, very similar to mlocate/plocate, but for Windows.

I updated the code and added it to the codebase.
Because Everything is closed source, this is completely based off of reverse-engineering the code, and I haven't found any reference implementation on the internet to help (AFAIK this is the only parser), so this is all based off of my (not too great) reversing skills.
I've tested this on ~10 random database files I had lying around, from multiple computers, all of them have given exactly the same exact results as Everything itself (checked by exporting to CSV and comparing md5sums).
It should support any DB created since 2017, and if given a broken file, I'm willing to add support for earlier versions as well.

All comments are mine, written while reversing the code.

This is relatively slow code (takes 4.5 seconds for a DB with 126828 files), I have a version written in Rust which is 22 times faster, and if that's something you are interested in, then I'm happy to try creating bindings with Py03.

cobyge avatar Jan 26 '24 22:01 cobyge

I've now added support for all filesystem types supported by Everything stable (Currently NTFS/REFS/EFU/Folder), along with tests for each.

When I have some more time I'll add support for more versions (Everything 1.5.0alpha currently uses version 1.7.49 and also supports FAT, network drives, and network indexes)

cobyge avatar Jan 27 '24 23:01 cobyge

Really cool PR! Since this is another big one, please give it some time for us to do the review :). Stay tuned!

Horofic avatar Jan 29 '24 15:01 Horofic

Hey, thanks for the review. I updated the code according to your request, and I've also added support for a previous version of Everything, in order to what supporting multiple versions might look like.

The only request I haven't worked on yet is the request regarding using dissect.cstruct. I'll have to think a bit about how to implement it, because of differences between structs for multiple versions.

I'd be happy to hear thoughts about how I handled different versions in the code (I'm not quite happy about with the version handling).

cobyge avatar Mar 02 '24 17:03 cobyge

Hey! It's been a while, but I finally got around to rewriting this with a simpler (better?) implementation, using cstruct.

I haven't been following any project changes for a while, so if I need to change anything let me know (the tests pass 🤷).

I'm resolving the previous comments as they are no longer relevant (and are all fixed in this implementation) Hope we can get this pushed sometime soon 😄

cobyge avatar Jan 17 '25 23:01 cobyge

@cobyge yes, it has been a while! Thanks for the time you took to change your implementation. We plan to make time so we can properly review it soon :)

Miauwkeru avatar Jan 20 '25 13:01 Miauwkeru

I also found a weird inconsistency. After running it multiple times on static data, the number of records kept changing. So there is something weird going on there.

It seems I was too fast with calling it an inconsistency, I checked it wrong. So you can ignore that comment

Miauwkeru avatar Feb 13 '25 12:02 Miauwkeru

Thanks for the review. I'm vacationing the sea for the next couple of weeks, so hopefully I'll get to this right after that

cobyge avatar Feb 14 '25 19:02 cobyge

Hi @cobyge , hope you are doing well. Just checking in to know if you plan to process this review? Thanks

EinatFox avatar Apr 22 '25 14:04 EinatFox

Hey, I actually totally forgot about this, sorry about that 😅

Going over the comments now and fixing them. Something I just noticed when rebuilding this is that packaging is a new dependency that is no used so far. Would it be okay to add it as a dependency? Its an official pypy package if that helps

(edit: turns out pytest has packaging as a dependency which is why it hasn't been an issue while running tests)

cobyge avatar Apr 25 '25 09:04 cobyge

Finished with the comments, I left some comments unresolved that I need your response/approval for

cobyge avatar Apr 25 '25 10:04 cobyge

Something I just noticed when rebuilding this is that packaging is a new dependency that is no used so far. Would it be okay to add it as a dependency? Its an official pypy package if that helps

I took a look at the package, and it should be added to the dependencies. As it would generate an error without it otherwise.

Miauwkeru avatar May 28 '25 11:05 Miauwkeru

Hey, I've gone over your comments again, fixed everything except for the ones I left responses on.

cobyge avatar Jun 07 '25 16:06 cobyge

I've updated the code with fixes in response to your comments

cobyge avatar Jun 27 '25 15:06 cobyge

I don't know why I can't comment on https://github.com/fox-it/dissect.target/pull/515#discussion_r2174900590, so I'll answer here. The answer is that I don't have an example this moment, I know that earlier versions were missing some fields and newer versions have added yet more fields (I can't find the documentation I had on that, it's from a long while back). If you prefer I create a struct per version, that can be done. The state I am imaging is the following: Currently we have this code:

f"""
    struct ntfs_header {{
        // Guid in format \\\\?\\Volume{{GUID}}
        EverythingVarBytes guid;
        // Disk drive (C:, D:)
        EverythingVarBytes path;
        EverythingVarBytes root;
        {version_match("EverythingVarBytes include_only;", version >= (1, 7, 9))}
        {version_match("uint64_t journal_id;", version >= (1, 7, 9))}
        {version_match("uint64_t next_usn;", version >= (1, 7, 9))}
    }};
"""

Changing it to two separate versions is easy. If tomorrow someone tests Everything with a DB version between 1.7.9 and 1.7.20 (currently the two versions I diffed between), and we discover that journal_id was actually only added in 1.7.11 and next_usn was only added in 1.7.15, then the code becomes:

f"""
    struct ntfs_header {{
        // Guid in format \\\\?\\Volume{{GUID}}
        EverythingVarBytes guid;
        // Disk drive (C:, D:)
        EverythingVarBytes path;
        EverythingVarBytes root;
        {version_match("EverythingVarBytes include_only;", version >= (1, 7, 9))}
        {version_match("uint64_t journal_id;", version >= (1, 7, 11))}
        {version_match("uint64_t next_usn;", version >= (1, 7, 15)}
    }};
"""

And creating multiple copies becomes much more verbose. If you want me to do this anyways, that's fine, just let me know

cobyge avatar Jul 11 '25 16:07 cobyge

If you want me to do this anyways, that's fine, just let me know

No it's fine, let's keep it this way. One small change and this should be good to merge. I'll put the outstanding comment about moving it to a different spot in a new ticket for future consideration.

Schamper avatar Jul 12 '25 10:07 Schamper

Codecov Report

Attention: Patch coverage is 90.63830% with 22 lines in your changes missing coverage. Please review.

Project coverage is 78.33%. Comparing base (76c26a1) to head (e859a4b). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ect/target/plugins/os/windows/everything/parser.py 90.00% 20 Missing :warning:
...ct/target/plugins/os/windows/everything/_plugin.py 94.28% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #515      +/-   ##
==========================================
+ Coverage   78.24%   78.33%   +0.08%     
==========================================
  Files         363      365       +2     
  Lines       32960    33195     +235     
==========================================
+ Hits        25791    26004     +213     
- Misses       7169     7191      +22     
Flag Coverage Δ
unittests 78.33% <90.63%> (+0.08%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Jul 13 '25 09:07 codecov[bot]

CodSpeed Performance Report

Merging #515 will not alter performance

Comparing cobyge:feature/add-everything-plugin (e859a4b) with main (76c26a1)

:tada: Hooray! pytest-codspeed just leveled up to 4.0.0!

A heads-up, this is a breaking change and it might affect your current performance baseline a bit. But here's the exciting part - it's packed with new, cool features and promises improved result stability :partying_face:! Curious about what's new? Visit our releases page to delve into all the awesome details about this new version.

Summary

✅ 5 untouched benchmarks

codspeed-hq[bot] avatar Jul 13 '25 09:07 codspeed-hq[bot]