online WopiStorage: after downloading a file, check if it is a zip

For relevant file extensions.

Change-Id: If8b5a1c7057118adc9154b156bb5a82057a70fd4

Resolves: #
Target version: master

Summary

To help detect wopi server-side errors.

Checklist

[x] I have run make prettier-write and formatted the code.
[x] All commits have Change-Id
[x] I have run tests with make check
[x] I have issued make run and manually verified that everything looks okay
[ ] Documentation (manuals or wiki) has been updated or is not required

May 16 '25 12:05 meven

what does this solves?

May 16 '25 13:05 hfiguiere

what does this solves?

It does not solve anything. It warns of bad WOPI server implementation. This could be turned into a hard error, surfaced to the user, but I'd like to do it later.

May 16 '25 14:05 meven

You should file a follow-up issue to resurface the error. Having it in the log right now should help debugging though.

May 20 '25 13:05 hfiguiere

I was wondering if we couldn't check this before saving the response data onto disk. To avoid reopening to check the first few bytes. We should have enough info...

May 20 '25 13:05 hfiguiere

Thanks, @meven. Since the commit doesn't say why we need to do this, perhaps I'm missing the goal, but...

I thought it was pretty self-explanatory. If a Wopi implementation misbehaves/delivers unexpected content, warn the system admininstrator.

Doing some input check doesn't cost much runtime-wise, but can help detecting issues efficiently. That's just input validation.

We do support formats that aren't zipped (e.g. PDF), and there is no guarantee/requirement that all future documents will be zipped.

My patch applies only on type extensions that are explicitly zipped file formats.

Also, there is no expectation or requirement that the files will have an extension at all, or even the proper extension. We still support loading/editing them, regardless.

This is at least a strong heuristic, any user would be confused if the file extension does not match the file content and would trigger some negative on the EFSS should this happen.

It's also unclear that this error will help anyone. Troubleshooting the storage/host is best done with an HTTP client, like curl or wget. Otherwise, this logic will log an error every so often when there is nothing wrong (e.g. when the user loads a text file with the wrong extension, which we do support), it will also slow loading all documents with those extensions, and (worst of all?) it can throw exceptions and break things badly.

Perhaps I missed the point of the patch.

Except a WOPI host could have some random bugs, due to load for instance, that would prevent normal business. If we can first detect it, and then perhaps later have an explicit error, that would shorten the support phase. Now we can easily pinpoint the issue origin.

sha hash checking might be useful too.

May 28 '25 11:05 meven

this should be rebased.

Nov 04 '25 23:11 hfiguiere