Volatility keeps broken/partial PDBs, preventing future analysis
I am putting this ticket so it gets the parity label and we deep dive it before the parity release.
We noticed during a training in a place with unstable wifi that:
- Volatility will start to download a PDB upon plugin run/automagic
- It seemingly cannot download the whole file due to the wifi cutting out
- An exception is thrown related to an index error when parsing the PDB
- It (vol3) keeps the broken PDB around for future analysis and breaks running plugins indefinitely against samples matching that PDB
We are going to simulate this by turning an interface off in a VM while the PDB is being downloaded, likely with sleep() calls to make sure it happens on time.
After diagnosing it further then we can figure out what the fix should look like.
This happened to almost every student on the first day across samples so we eventually distributed known-good sets of symbol tables so they could do the labs.
Yeah, we stash everything we receive directly into a cache file. I guess we could either mark is as temp until we know it all downloaded (although that doesn't cope with Microsoft handing us bad files) or we could just not use the caching mechanism, since it's supposed to be turned into a JSON anyway? I don't know what happens if you get the PDB fine, but then interrupt the JSON production whether it still writes a partial JSON? I don't think it should, but that's something you should test too. How we cope with a partial JSON I'm not sure...
It seems the PDB document table contains a hash that we can use to check integrity.
https://github.com/dotnet/runtime/blob/main/docs/design/specs/PortablePdb-Metadata.md#document-table-0x30
Document Table: 0x30
The Document table has the following columns:
Name (Blob heap index of [document name blob](https://github.com/dotnet/runtime/blob/main/docs/design/specs/PortablePdb-Metadata.md#document-name-blob))
HashAlgorithm (Guid heap index)
Hash (Blob heap index)
Language (Guid heap index)
The table is not required to be sorted.
There shall be no duplicate rows in the Document table, based upon document name.
Name shall not be nil. It can however encode an empty name string.
Hash is the file content hashed using the specified HashAlgorithm. It is used to validate that a source file matches the one used by the compiler when compiling the source code.
Ok, so the fix probably belongs in this region of the code. I haven't fully figured out where but there seems to be very little exception handling around there. I'll dig deeper into it tomorrow though, it's getting pretty late here... 5:S
I have noticed this issue happens when multiple instances of Volatility3 run simultaneously, such as executing all Windows test cases at once in VSCode. Each instance attempts to fetch the same PDB file from MS servers, which leads to dropped connections (all of them) and a corrupted PDB file. This also causes subsequent vol3 runs to fail indefinitely until the corrupted PDB file is manually deleted and later a single vol3 instance is ran again to obtain the PDB file.
Apart from implementing a file integrity check or downloading to a temporary file, both of which I agree are necessary, we should also prevent this situation from happening in the first place. One way to do this is by using a system-level file lock to ensure only one process downloads the PDB at a time, similar to how Debian/Ubuntu's APT or other package managers handle concurrent updates.
Unfortunately, Python doesn't provide a cross-platform module/function for this. However, we can implement it ourselves using something like this or, otherwise, a third-party module such as filelock 1 , 2.
@ikelos I believe Gus' comment above figured out a major issue we are seeing in mass testing Volatility 3. The testing harness runs many plugins in parallel and runs multiple plugins against the same sample at the same time (best for kernel caching). This also matches how external users have automations developed for Volatility 2 and 3 (auto run all the plugins and save the output).
With this in mind, we need to figure out a way to prevent the corruption of the PDBs by multiple processes running at once. The filelock library that Gus pointed to looks promising and was linked from all the related Stack Overflow threads that I could find.
Thoughts?
I was going to start with a check for file existance, but I don't know what to do whilst we're waiting? Carry on, abort the run, sit there? How will we know when the file's complete? I could write it into a temp file and then move it to the cache once it's done, but then we'd also need to check the hash to prevent corruption, so yeah, I'm working on how to get around the problem gracefully...
@ikelos this is another non-plugin related one we need to figure out. Just got another message today of a group running into when scripting Volatility 3. What do you think the of the filelock library that Gus pointed to?
@ikelos I am going to pull the parity label off this one but leave it open. Your PR helps one case, but we didn't get to do the locking which would be the full fix but needs a lot of testing and thought to be cross platform.
This issue is stale because it has been open for 200 days with no activity.
This issue was closed because it has been inactive for 60 days since being marked as stale.