acquire icon indicating copy to clipboard operation
acquire copied to clipboard

Extend webserver collection

Open qmadev opened this issue 4 months ago • 18 comments

closes #139

Testing on VMs with IIS, Nginx and Caddy seems to work fine with an adjusted version of dissect.target #1287.

- Collecting file C:/Windows/System32/LogFiles/HTTPERR/httperr1.log to: fs/C:/Windows/System32/LogFiles/HTTPERR/httperr1.log
- Collecting file C:/Windows/System32/LogFiles/HTTPERR/httperr1.log succeeded

- Collecting file /var/caddy/access.json to: fs/$rootfs$/var/caddy/access.json
- Collecting file /var/caddy/access.json succeeded
- Collecting file /root/access.log to: fs/$rootfs$/root/access.log
- Collecting file /root/access.log succeeded

The issue specifies that tests should be written. Do we still want that, even with the tests that are already there in dissect.target?

qmadev avatar Aug 16 '25 01:08 qmadev

What version should I put at line 873? See question marks.

qmadev avatar Aug 16 '25 01:08 qmadev

Any news on this?

qmadev avatar Aug 25 '25 15:08 qmadev

@Schamper do you have some time to take a second look at this?

qmadev avatar Sep 02 '25 09:09 qmadev

I'd like @twiggler to have an opinion too.

Schamper avatar Sep 02 '25 10:09 Schamper

@twiggler do you have some time to review this?

qmadev avatar Sep 10 '25 23:09 qmadev

@Schamper @twiggler

Just a friendly reminder.

qmadev avatar Oct 20 '25 18:10 qmadev

@Schamper @twiggler

Just a friendly reminder.

Sorry @qmadev this slipped through the cracks, I will take a look this week

twiggler avatar Oct 21 '25 08:10 twiggler

@Schamper @qmadev

One detail I don't like is that we are invoking a private method in target from acquire. I think this is a design oversight in the implementation of https://github.com/fox-it/dissect.target/pull/1082. Clearly, _get_paths should be made public by removing the underscore. However, this causes a name-clash with the get_paths defined in the Plugin class (template method pattern).

Perhaps we should rename _get_paths to get_paths_from_target (contrast with _get_paths_direct).

Otherwise it looks fine!

twiggler avatar Oct 23 '25 10:10 twiggler

I think this is a design oversight in the implementation of fox-it/dissect.target#1082.

Not necessarily. https://github.com/fox-it/dissect.target/pull/1082#pullrequestreview-2793551946 goes into more detail here:

This method still doesn't quite cover "getting all paths that are relevant to a plugin", since it's only really meant for "paths the plugin will parse for artefacts", and not configuration paths. So we can't quite 1:1 use this method to collect files with acquire. I think we should focus on this approach first, but perhaps in a next stage we can think of how to tackle that, perhaps a _get_aux_paths() to be implemented by plugins, and a Plugin.get_all_paths to be consumed by acquire?

Schamper avatar Oct 23 '25 10:10 Schamper

I think this is a design oversight in the implementation of fox-it/dissect.target#1082.

Not necessarily. fox-it/dissect.target#1082 (review) goes into more detail here:

This method still doesn't quite cover "getting all paths that are relevant to a plugin", since it's only really meant for "paths the plugin will parse for artefacts", and not configuration paths. So we can't quite 1:1 use this method to collect files with acquire. I think we should focus on this approach first, but perhaps in a next stage we can think of how to tackle that, perhaps a _get_aux_paths() to be implemented by plugins, and a Plugin.get_all_paths to be consumed by acquire?

Ah yes, we decided to defer.

Although I don´t see any config paths yet in the companion target PR, I think it would indeed be a good idea to introduce Plugin::get_all_paths already, which can delegate to Plugin::_get_paths for the time being, if it exists in the subclass. That way, we have the API in place, otherwise contributors might be inclined to keep using Plugin::_get_paths

twiggler avatar Oct 23 '25 14:10 twiggler

To clarify, you want me to create a function Plugin.get_all_paths() that will return a collection/iterator something containing the paths to the config files and the (resolved) log files? And you want acquire to use that function instead of the _get_paths() function?

I do think the naming is confusing. Maybe something like Plugin._get_log_paths() Plugin._get_config_paths() Plugin.get_all_paths()

would be clearer?

qmadev avatar Oct 24 '25 01:10 qmadev

To clarify, you want me to create a function Plugin.get_all_paths() that will return a collection/iterator something containing the paths to the config files and the (resolved) log files? And you want acquire to use that function instead of the _get_paths() function?

Yes. Although I think we can leave the config_file part, because we don´t need that yet (I don´t see them in https://github.com/fox-it/dissect.target/pull/1287)

The implementation of Plugin::get_all_paths() indeed invokes Plugin::_get_paths, while handling the possible exception when it is not implemented by yielding nothing. We could also write a default implementation for Plugin::_get_paths which returns nothing.

You can then remove the

 if not hasattr(subclass, "_get_paths"):
                continue

snippet, and indeed invoke Plugin::get_all_paths() from acquire.

The implementations of Plugin::get_all_paths is likely to change in the future, but the idea is that the interface remains the same.

I agree that the naming of the functions is not ideal. _get_paths -> _get_log_paths seems fine to me, but I think that _get_paths_direct should then also be renamed to _get_direct_log_paths or similar.

Actually, this is a bit of a contentious issue because I would like to keep the direct files feature restricted to these plugin, primarily because I don´t like how it works :). I would advocate for users to run parsers directly on log files without the target machinery.

Next week I am away, these are my 2 cents, Schamper can take over in my absence.

twiggler avatar Oct 24 '25 08:10 twiggler

I agree that the naming of the functions is not ideal. _get_paths -> _get_log_paths seems fine to me, but I think that _get_paths_direct should then also be renamed to _get_direct_log_paths or similar.

Only a very small amount of plugins work on "logs", so the name is fine in my opinion. Most artifacts are not "logs".

Plugin._get_config_paths()

And not every "other" path is a config.

Actually, this is a bit of a contentious issue because I would like to keep the direct files feature restricted to these plugin, primarily because I don´t like how it works :). I would advocate for users to run parsers directly on log files without the target machinery.

That doesn't change how a Plugin is supposed to report what files it requires.

Schamper avatar Oct 24 '25 09:10 Schamper

That doesn't change how a Plugin is supposed to report what files it requires.

Correct, because that is handled by Plugin.get_all_paths()

I don´t feel particularly strong about _get_paths or _get_log_paths; my only point is not to couple with module internals of dissect.target in acquire

twiggler avatar Oct 24 '25 12:10 twiggler

I see these functions in dissect.target

    def get_paths(self) -> Iterator[Path]:
        if self.target.is_direct:
            yield from self._get_paths_direct()
        else:
            yield from self._get_paths()

    def _get_paths_direct(self) -> Iterator[Path]:
        """Return all paths as given by the user."""
        for path in self.target._loader.paths:
            yield self.target.fs.path(str(path))

    def _get_paths(self) -> Iterator[Path]:
        """Return all files of interest to the plugin.

        To be implemented by the plugin subclass.
        """
        raise NotImplementedError

Why do we need to create a new function get_all_paths() if there already is a get_paths() that uses _get_paths()?

qmadev avatar Oct 24 '25 13:10 qmadev

@qmadev get_paths would refer to the "artifact paths". I.e. an actual log file or database to parse. But often you need more contextual files, such as webserver configuration files. We might call this "auxiliary paths". And then a get_all_paths would encompass both "artifact paths" and "auxiliary paths".

It's all very confusing but I've not had a better idea yet.

Schamper avatar Oct 24 '25 13:10 Schamper

https://github.com/fox-it/dissect.target/pull/1287

Is this what you guys meant?

All seems to work fine. Config files get collected as well.

qmadev avatar Oct 24 '25 14:10 qmadev

@twiggler comments here and in dissect.target should be resolved!

qmadev avatar Nov 19 '25 15:11 qmadev

Codecov Report

:x: Patch coverage is 50.00000% with 10 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 44.94%. Comparing base (9a1e62f) to head (e06ac42). :warning: Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
acquire/acquire.py 50.00% 10 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #264      +/-   ##
==========================================
+ Coverage   44.90%   44.94%   +0.03%     
==========================================
  Files          26       26              
  Lines        3543     3558      +15     
==========================================
+ Hits         1591     1599       +8     
- Misses       1952     1959       +7     
Flag Coverage Δ
unittests 44.94% <50.00%> (+0.03%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 27 '25 13:11 codecov[bot]

don't forget to change the dissect.target dependency inside the pyproject.toml file

diff --git a/pyproject.toml b/pyproject.toml
index 36d7e33..9032ab7 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -26,7 +26,7 @@ classifiers = [
 ]
 dependencies = [
     "dissect.cstruct>=4,<5",
-    "dissect.target>=3.24,<4",
+    "dissect.target>=3.25.dev,<4",  # TODO: update on release
 ]
 dynamic = ["version"]
 
@@ -47,7 +47,7 @@ full = [
 dev = [
     "acquire[full]",
     "dissect.cstruct>=4.0.dev,<5.0.dev",
-    "dissect.target[dev]>=3.24.dev,<4.0.dev",
+    "dissect.target[dev]>=3.25.dev,<4.0.dev",
 ]
 
 [dependency-groups]

Miauwkeru avatar Dec 02 '25 13:12 Miauwkeru

Will fix this Thursday.

qmadev avatar Dec 02 '25 23:12 qmadev