thoth
thoth copied to clipboard
Consider adding check to Location fullTextUrl field to ensure it points directly to file
Work on https://github.com/thoth-pub/thoth/issues/405 required downloading PDF publication files directly from Location fullTextUrls, including checking that the URL returned Content-Type: application/pdf
. This uncovered many user-entered fullTextUrls which instead returned Content-Type: text/html
. These were fortunately simple to fix via a bulk database update, but would have been onerous for the user to change individually.
Perhaps we could/should do a Content-Type
check when the user tries to save a fullTextUrl, to prevent similar issues in future.
This should be prioritised soon, to help reduce the number of auto dissemination errors
Prioritising this might be the simplest mitigation to the issue described here.
Note that more than one Content-Type
may be valid for any given Publication Type, e.g. application/octet-stream
is also permissible for PDFs. This may require a change to the thoth-dissemination check if we start hitting issues with it (we haven't so far).