joplin
joplin copied to clipboard
Clipper: Add capability of limiting downloads
Summary
Creating a new abstraction DownloadLimiter
that can be used to limit the amount of data that extractNoteFromHTML
can download. This works by counting how many images and the number of bytes downloaded by the shim.fetchBlob
calls.
Changes
- Limit how much data can be downloaded by the
extractNoteFromHTML
. - Change the "promise-pool" library.
- Add a log when the limit is reached
- Add a header to a note body if the limit is reached.
Why change the promise-pool library
While the original library was working as intended, it was designed to work more like a Promise.all
than to give the user control of what is being executed. To make it work for our use case we would need to make the code not so straightforward. See: https://github.com/timdp/es6-promise-pool/issues/61
Since the library is only being used here I thought it would be ok to change to another one that seems to be better aligned to what we need.
Replacement: https://github.com/supercharge/promise-pool https://www.npmjs.com/package/@supercharge/promise-pool
Log pattern:
2024-01-20 00:29:43: downloadController: Owner id: qcc9jPy9UZ5xSuBhdBkSTZ - Total bytes stored: 841811. Maximum: 52428800 - Images initiated for download: 12. Maximum: 11. Expected: 57
2024-01-20 00:29:43: downloadController: Owner id: {ownerId} - Total bytes stored: {how much was downloaded in the request}. Maximum: { value set in the LimitedDownloadController } - Images initiated for download: { how many images started to be downloaded}. Maximum: { value set in the LimitedDownloadController }. Expected: { how many image URLs we had in the email body}
Implementation
There are "two behaviors".
When the image count limit is reached. We can stop new downloads from even starting, there is only one warn log message since we can control it:
When the total number of bytes is exceeded. Since the only place we can check this is inside the fetchBlob function and the error is being controlled by a try-catch there we can are potential logging a lot of error messages:
Testing
This change should not affect anything since the current code only uses a DummyDownloadController
class as the fallback.
Any unexpected behavior from the Joplin Web Clipper or any code that uses shim.fetchBlob
is a potential bug.
~This is a case where using the promising-pool
lib is better than Promise.all
. If we were using Promise.all
we wouldn't be able to stop the execution midway (for example, if the content that is being downloaded exceeded a size limit that we choose).~
We talked about having each operation throw an error when the limit is reached instead of the class having a state that identifies if the limit has been reached. The problem with this approach is that if we have 100 image links inside the content and the size limit is reached at the 10th image we will be logging 90 error messages.
I'm not even sure how I could avoid this kind of error logging flood because it is a behavior that happens inside fetchBlob
function, so the only way I can think would be to use a try catch to suppress the errors that are related to this feature (I have an error code to identify it).