Extend webserver collection
closes #139
Testing on VMs with IIS, Nginx and Caddy seems to work fine with an adjusted version of dissect.target #1287.
- Collecting file C:/Windows/System32/LogFiles/HTTPERR/httperr1.log to: fs/C:/Windows/System32/LogFiles/HTTPERR/httperr1.log
- Collecting file C:/Windows/System32/LogFiles/HTTPERR/httperr1.log succeeded
- Collecting file /var/caddy/access.json to: fs/$rootfs$/var/caddy/access.json
- Collecting file /var/caddy/access.json succeeded
- Collecting file /root/access.log to: fs/$rootfs$/root/access.log
- Collecting file /root/access.log succeeded
The issue specifies that tests should be written. Do we still want that, even with the tests that are already there in dissect.target?
What version should I put at line 873? See question marks.
Any news on this?
@Schamper do you have some time to take a second look at this?
I'd like @twiggler to have an opinion too.
@twiggler do you have some time to review this?
@Schamper @twiggler
Just a friendly reminder.
@Schamper @twiggler
Just a friendly reminder.
Sorry @qmadev this slipped through the cracks, I will take a look this week
@Schamper @qmadev
One detail I don't like is that we are invoking a private method in target from acquire. I think this is a design oversight in the implementation of https://github.com/fox-it/dissect.target/pull/1082. Clearly, _get_paths should be made public by removing the underscore. However, this causes a name-clash with the get_paths defined in the Plugin class (template method pattern).
Perhaps we should rename _get_paths to get_paths_from_target (contrast with _get_paths_direct).
Otherwise it looks fine!
I think this is a design oversight in the implementation of fox-it/dissect.target#1082.
Not necessarily. https://github.com/fox-it/dissect.target/pull/1082#pullrequestreview-2793551946 goes into more detail here:
This method still doesn't quite cover "getting all paths that are relevant to a plugin", since it's only really meant for "paths the plugin will parse for artefacts", and not configuration paths. So we can't quite 1:1 use this method to collect files with acquire. I think we should focus on this approach first, but perhaps in a next stage we can think of how to tackle that, perhaps a _get_aux_paths() to be implemented by plugins, and a Plugin.get_all_paths to be consumed by acquire?
I think this is a design oversight in the implementation of fox-it/dissect.target#1082.
Not necessarily. fox-it/dissect.target#1082 (review) goes into more detail here:
This method still doesn't quite cover "getting all paths that are relevant to a plugin", since it's only really meant for "paths the plugin will parse for artefacts", and not configuration paths. So we can't quite 1:1 use this method to collect files with acquire. I think we should focus on this approach first, but perhaps in a next stage we can think of how to tackle that, perhaps a _get_aux_paths() to be implemented by plugins, and a Plugin.get_all_paths to be consumed by acquire?
Ah yes, we decided to defer.
Although I don´t see any config paths yet in the companion target PR, I think it would indeed be a good idea to introduce Plugin::get_all_paths already, which can delegate to Plugin::_get_paths for the time being, if it exists in the subclass. That way, we have the API in place, otherwise contributors might be inclined to keep using Plugin::_get_paths
To clarify, you want me to create a function Plugin.get_all_paths() that will return a collection/iterator something containing the paths to the config files and the (resolved) log files? And you want acquire to use that function instead of the _get_paths() function?
I do think the naming is confusing. Maybe something like
Plugin._get_log_paths()
Plugin._get_config_paths()
Plugin.get_all_paths()
would be clearer?
To clarify, you want me to create a function
Plugin.get_all_paths()that will return a collection/iterator something containing the paths to the config files and the (resolved) log files? And you wantacquireto use that function instead of the_get_paths()function?
Yes. Although I think we can leave the config_file part, because we don´t need that yet (I don´t see them in https://github.com/fox-it/dissect.target/pull/1287)
The implementation of Plugin::get_all_paths() indeed invokes Plugin::_get_paths, while handling the possible exception when it is not implemented by yielding nothing. We could also write a default implementation for Plugin::_get_paths which returns nothing.
You can then remove the
if not hasattr(subclass, "_get_paths"):
continue
snippet, and indeed invoke Plugin::get_all_paths() from acquire.
The implementations of Plugin::get_all_paths is likely to change in the future, but the idea is that the interface remains the same.
I agree that the naming of the functions is not ideal.
_get_paths -> _get_log_paths seems fine to me,
but I think that _get_paths_direct should then also be renamed to _get_direct_log_paths or similar.
Actually, this is a bit of a contentious issue because I would like to keep the direct files feature restricted to these plugin, primarily because I don´t like how it works :). I would advocate for users to run parsers directly on log files without the target machinery.
Next week I am away, these are my 2 cents, Schamper can take over in my absence.
I agree that the naming of the functions is not ideal.
_get_paths->_get_log_pathsseems fine to me, but I think that_get_paths_directshould then also be renamed to_get_direct_log_pathsor similar.
Only a very small amount of plugins work on "logs", so the name is fine in my opinion. Most artifacts are not "logs".
Plugin._get_config_paths()
And not every "other" path is a config.
Actually, this is a bit of a contentious issue because I would like to keep the direct files feature restricted to these plugin, primarily because I don´t like how it works :). I would advocate for users to run parsers directly on log files without the target machinery.
That doesn't change how a Plugin is supposed to report what files it requires.
That doesn't change how a Plugin is supposed to report what files it requires.
Correct, because that is handled by Plugin.get_all_paths()
I don´t feel particularly strong about _get_paths or _get_log_paths; my only point is not to couple with module internals of dissect.target in acquire
I see these functions in dissect.target
def get_paths(self) -> Iterator[Path]:
if self.target.is_direct:
yield from self._get_paths_direct()
else:
yield from self._get_paths()
def _get_paths_direct(self) -> Iterator[Path]:
"""Return all paths as given by the user."""
for path in self.target._loader.paths:
yield self.target.fs.path(str(path))
def _get_paths(self) -> Iterator[Path]:
"""Return all files of interest to the plugin.
To be implemented by the plugin subclass.
"""
raise NotImplementedError
Why do we need to create a new function get_all_paths() if there already is a get_paths() that uses _get_paths()?
@qmadev get_paths would refer to the "artifact paths". I.e. an actual log file or database to parse. But often you need more contextual files, such as webserver configuration files. We might call this "auxiliary paths". And then a get_all_paths would encompass both "artifact paths" and "auxiliary paths".
It's all very confusing but I've not had a better idea yet.
https://github.com/fox-it/dissect.target/pull/1287
Is this what you guys meant?
All seems to work fine. Config files get collected as well.
@twiggler comments here and in dissect.target should be resolved!
Codecov Report
:x: Patch coverage is 50.00000% with 10 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 44.94%. Comparing base (9a1e62f) to head (e06ac42).
:warning: Report is 1 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| acquire/acquire.py | 50.00% | 10 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #264 +/- ##
==========================================
+ Coverage 44.90% 44.94% +0.03%
==========================================
Files 26 26
Lines 3543 3558 +15
==========================================
+ Hits 1591 1599 +8
- Misses 1952 1959 +7
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 44.94% <50.00%> (+0.03%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
don't forget to change the dissect.target dependency inside the pyproject.toml file
diff --git a/pyproject.toml b/pyproject.toml
index 36d7e33..9032ab7 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -26,7 +26,7 @@ classifiers = [
]
dependencies = [
"dissect.cstruct>=4,<5",
- "dissect.target>=3.24,<4",
+ "dissect.target>=3.25.dev,<4", # TODO: update on release
]
dynamic = ["version"]
@@ -47,7 +47,7 @@ full = [
dev = [
"acquire[full]",
"dissect.cstruct>=4.0.dev,<5.0.dev",
- "dissect.target[dev]>=3.24.dev,<4.0.dev",
+ "dissect.target[dev]>=3.25.dev,<4.0.dev",
]
[dependency-groups]
Will fix this Thursday.