joplin icon indicating copy to clipboard operation
joplin copied to clipboard

Clipper: Add capability of limiting downloads

Open pedr opened this issue 1 year ago • 1 comments

Summary

Creating a new abstraction DownloadLimiter that can be used to limit the amount of data that extractNoteFromHTML can download. This works by counting how many images and the number of bytes downloaded by the shim.fetchBlob calls.

Changes

  • Limit how much data can be downloaded by the extractNoteFromHTML.
  • Change the "promise-pool" library.
  • Add a log when the limit is reached
  • Add a header to a note body if the limit is reached.

Why change the promise-pool library

While the original library was working as intended, it was designed to work more like a Promise.all than to give the user control of what is being executed. To make it work for our use case we would need to make the code not so straightforward. See: https://github.com/timdp/es6-promise-pool/issues/61

Since the library is only being used here I thought it would be ok to change to another one that seems to be better aligned to what we need.

Replacement: https://github.com/supercharge/promise-pool https://www.npmjs.com/package/@supercharge/promise-pool

Log pattern:

2024-01-20 00:29:43: downloadController: Owner id: qcc9jPy9UZ5xSuBhdBkSTZ - Total bytes stored: 841811. Maximum: 52428800 - Images initiated for download: 12. Maximum: 11. Expected: 57

2024-01-20 00:29:43: downloadController: Owner id: {ownerId} - Total bytes stored: {how much was downloaded in the request}. Maximum: { value set in the LimitedDownloadController } - Images initiated for download: { how many images started to be downloaded}. Maximum: { value set in the LimitedDownloadController }. Expected: { how many image URLs we had in the email body}

Implementation

There are "two behaviors".

When the image count limit is reached. We can stop new downloads from even starting, there is only one warn log message since we can control it:

When the total number of bytes is exceeded. Since the only place we can check this is inside the fetchBlob function and the error is being controlled by a try-catch there we can are potential logging a lot of error messages:

Testing

This change should not affect anything since the current code only uses a DummyDownloadController class as the fallback.

Any unexpected behavior from the Joplin Web Clipper or any code that uses shim.fetchBlob is a potential bug.

pedr avatar Jan 26 '24 19:01 pedr

~This is a case where using the promising-pool lib is better than Promise.all. If we were using Promise.all we wouldn't be able to stop the execution midway (for example, if the content that is being downloaded exceeded a size limit that we choose).~

pedr avatar Feb 22 '24 13:02 pedr

We talked about having each operation throw an error when the limit is reached instead of the class having a state that identifies if the limit has been reached. The problem with this approach is that if we have 100 image links inside the content and the size limit is reached at the 10th image we will be logging 90 error messages.

I'm not even sure how I could avoid this kind of error logging flood because it is a behavior that happens inside fetchBlob function, so the only way I can think would be to use a try catch to suppress the errors that are related to this feature (I have an error code to identify it).

pedr avatar Feb 28 '24 13:02 pedr