thoth icon indicating copy to clipboard operation
thoth copied to clipboard

Consider adding check to Location fullTextUrl field to ensure it points directly to file

Open rhigman opened this issue 2 years ago • 3 comments

Work on https://github.com/thoth-pub/thoth/issues/405 required downloading PDF publication files directly from Location fullTextUrls, including checking that the URL returned Content-Type: application/pdf. This uncovered many user-entered fullTextUrls which instead returned Content-Type: text/html. These were fortunately simple to fix via a bulk database update, but would have been onerous for the user to change individually.

Perhaps we could/should do a Content-Type check when the user tries to save a fullTextUrl, to prevent similar issues in future.

rhigman avatar Sep 27 '22 14:09 rhigman

This should be prioritised soon, to help reduce the number of auto dissemination errors

ja573 avatar Feb 01 '23 13:02 ja573

Prioritising this might be the simplest mitigation to the issue described here.

rhigman avatar Mar 07 '24 16:03 rhigman

Note that more than one Content-Type may be valid for any given Publication Type, e.g. application/octet-stream is also permissible for PDFs. This may require a change to the thoth-dissemination check if we start hitting issues with it (we haven't so far).

rhigman avatar May 07 '24 13:05 rhigman