calculated text checksum is different from the header one
thank you for your great work. I am working on building another AppInfo.vdf editor(web verision). The documentation has helped me a lot.
I parse an AppInfo object into a custom object. when I serialize it into binary format, the calculated hash matches the binary checksum in header perfectly.
but when I serialize it into text format and hash it, most of the results are different from the hash field in the binary header.
To investigate, I debugged appinfo.py and found that it also has the same issue.
it's seem that replace \\ to \\\\ didn't solve all problems.
if __name__ == "__main__":
appinfo = Appinfo('./appinfo.vdf')
checksum = appinfo.update_app(10)
Indeed, I just stumbled upon the exact same issue. I was also writing some code to manipulate the format, and didn't succeed in getting the hash to match the original hash for all apps.
I then started playing around with the code here, and I noticed that it also doesn't manage to get the original hash.
So, it turns out that in my code (which is written in Rust but that's besides the point), I do pretty much the exact same thing as this tools does. What's interesting though, is that for some of the apps, calculating the hash DOES work. Using the appinfo.vdf on my local machine, I can get about 600 apps to generate the correct hash, and roughly 5000 for which I can't. All using what I believe to be more or less the exact same technique as seen in this repository.
For an example of an app I'm pretty sure you will have in your local appinfo.vdf as well, try appid 1007 (Steamworks SDK Redist). You'll find that this one hashes just fine.
As for now, I have not been able to notice anything obvious that stands out between the apps that don't hash well and the ones that do. I assume we might need to sanitize the textual representation of the VDF a bit more, just like was necessary for the backslash characters.
Did you make any additional discoveries surrounding this @tralph3?
Are you guys implementing parsers for the latest version of appinfo? Now all strings are stored at the end of the file, and in the metadata itself there's only indeces that relate to the string's position in the end of the gile "list of strings".
That would explain why binary data works, but text vdf doesn't.
Although it would be pretty clear that you're doing something wrong if you take a look at the decoded strings and just see garbage. So it's likely something else.
Yes, I'm parsing the latest AppInfo format (v29). Parsing the binary format is very easy thanks to the people who've documented it, including yourself.
Rather, I'm talking about generating the correct hash from the textual representation of the binary vdf. I can confirm that what @ktKongTong is reporting is correct. The hash you generate in this application is often incorrect as well.
Here's a quick example (I'm not a Python programmer):
def main():
path = os.path.join(
"/home/romatthe/.local/share/Steam", "appcache", "appinfo.vdf"
)
appinfo = Appinfo(
path, False, None
)
app = appinfo.parsedAppInfo[440]
formatted = appinfo.dict_to_text_vdf(app["sections"])
print(list(app['checksum_text']))
print(app['checksum_text'].hex())
print(list(sha1(formatted).digest()))
print(sha1(formatted).hexdigest())
Result:
[225, 195, 108, 211, 159, 139, 245, 12, 125, 231, 89, 87, 93, 182, 131, 25, 94, 54, 230, 127]
e1c36cd39f8bf50c7de759575db683195e36e67f
[239, 26, 120, 182, 112, 140, 157, 65, 144, 194, 37, 189, 239, 239, 42, 154, 127, 149, 7, 155]
ef1a78b6708c9d4190c225bdefef2a9a7f95079b
To clarify: I'm just parsing the appinfo.vdf, picking a single app (Team Fortress 2 in this case since it should be easy to verify this yourself since pretty much everyone owns it), generating the textual VDF without having applied any edits, taking the hash, and then comparing that to the checksum originally found in appinfo.vdf. And as you can see, the checksums do not match.
Like I pointed out above, in my code (completely separate codebase) I generate the exact same checksums as you do, but again, these appear to be incorrect.
As I said, for about 10% of all the apps in my local appinfo.vdf this does actually work, but for the other 90% I get conflicting checksums. All of the output from my vdf-to-text routines produce results that look very sane, as in, they all look like perfectly acceptable textual VDF files with no obvious issues.
For those interested, I quickly took a dump of the textual VDF of both all the apps I was able to generate a matching checksum for (good.txt) and some examples of apps where I failed to get a matching checksum (bad.txt). Unfortunately I couldn't include all the bad ones as that file was 29Mb in size so I had to cut it a little.
So after trying to wrap my head around this for a few days, I can't quite get a satisfying answer. Here's more or less what I think the conclusion is:
- My (wip) tool, this tool, and SteamEdit all seem to use pretty much precisely the same technique for getting the checksum o the textual VDF. I haven't verified this 100%, but all my test between the three different tools so far have shown they generate the same hashes.
- For most apps, these hashes DO NOT match the ones that Valve ships in a clean
appinfo.vdf. In other words, it appears Valve likely does something else with the textual VDF representation. I still presume they do some extra string sanitation, but I wouldn't know what it is. - Curiously though: Steam does not seem to reject an altered
appinfo.vdfwith these (incorrect) hashes. I also can't quite understand why this doesn't happen.
For the last point, I simply tested this in my own tool by parsing an existing 'clean' appinfo.vdf, and then sent it through the packing routines without actually altering any of the content. That results in precisely the same appinfo.vdf as before EXCEPT that it now has the new hashes.
And from what I've seen, Steam does not reject these. It's possible that the Steam client never uses that hash to check the integrity of the file. Or it's possible it only uses it at certain points, which means it could still reject the file somewhere down the line... I don't quite know.
Either way, I do think I'm fairly confident in concluding that pretty much no one actually knows how to correct generate these checksums, except some folks over at Valve. I've found no code in the public space that does it correctly ("correctly" here meaning matching Valve's method).
Well, if Steam doesn't reject the hashes, and the changes are reflected, then it's still working, which is the important thing.
I'd look to actual text VDF files, shipped by Valve, and try to spot any discrepancies between them and the ones we generate in our programs.
Maybe there's an extra newline character at the end, or the beginning, or some escape sequence. Who knows.
I've spent quite a bit of time looking into this on and off over the past months, and I cannot find any consistency between the text where the hashes match and the text where the hashes don't match. I just can't quite figure out a pattern.
And like I said above, I still haven't found any tool or documentation that doesn't do exactly what both this and my own tool are doing. Even the most famous tool, SteamEdit, does exactly the same I think.
So as of now, I'm definitely tired of trying to figure it out :) And since it seems to work just fine, we'll see how it holds up the next time Valve decides to change the format some more.
Well, thanks for taking the time to confirm it is, at least, a non-issue.