OneNote: Importer ends in half-finished state
I just took the OneNote import improvements for a test drive hoping that I could now import my notebook[^1] but I got a weird result: the import ended with 82 items remaining, and had 6 errors due to "maximum retry attempts":
It looks like the limit is 5 and that can be exceeded pretty easily: https://github.com/obsidianmd/obsidian-importer/blob/b14b3ae79a278f0402f7bc5bd5f0b5152406aeb5/src/formats/onenote.ts#L16
and I guess 82 items just got lost somehow?
I'd be happy to help debug what's going wrong here if someone could give me a pointer where to look.
[^1]: which was previously hitting lots of rate-limiting errors
@altano Also facing the same rate limit and a need to keep rerunning the importer every 30min to continue where it stopped.
it would be better to make the process continue every predefined time like 30mins without the need to rerun the importer process again and maybe in background .
The OneNote api tells the importer to back off and the importer has code to back off, so there probably just shouldn’t be a max number of retries. Or it should be really high. Then retrying just wouldn’t be needed, which would be better.
The bug that’s causing some items to get stuck in “remaining”) without actually being in a queue is a separate issue. Looks like there’s still some flakiness here.
I debugged this a bit today and noticed a few issues with the onenote importing code:
Low severity:
- The retry backoff is per-request, so if the onenote API throttles us, every in-flight request keeps retrying. There should be one place the backoff happens, where ALL requests are paused while we're throttled.
- This Retry-After logic (
(+!response.headers.get('Retry-After') * 1000) || 15000;) isn't right: if the header is not in the response you always get back 1000, not 15000: https://github.com/obsidianmd/obsidian-importer/blob/b14b3ae79a278f0402f7bc5bd5f0b5152406aeb5/src/formats/onenote.ts#L890
High severity:
-
MAX_RETRY_ATTEMPTSis being trivially hit because 429 errors are contributing to failed requests. EitherMAX_RETRY_ATTEMPTSshould be removed or 429 backoff errors should not contribute to the max limit being hit. -
fetchResourcehas a codepath that is returningundefined, and I think that is both a mistake and leading to at least two problems. The way this happens is ifresponse.okis false, we have an inner error object in the response, but it is not code 40001 or 20166. If that happens, we just return an undefinedresponseBodyhere: https://github.com/obsidianmd/obsidian-importer/blob/b14b3ae79a278f0402f7bc5bd5f0b5152406aeb5/src/formats/onenote.ts#L900. This causes at least two problems:-
This line is throwing an error (
An internal error occurred while trying to fetch 'https://graph.microsoft.com/v1.0/me/onenote/sections/------------/pages?$select=id%2ctitle%2ccreatedDateTime%2clastModifiedDateTime%2clevel%2corder%2ccontentUrl&$orderby=order&pagelevel=true&$skip=60'. Error details: TypeError: Cannot read properties of undefined (reading 'value')) becausefetchResourcecan returnundefinedand it's not being checked for. This is possible becausefetchResourcereturnsPromise<any>andresponseBodyis typed asany, which will hide code that doesn't have nullish checks or code that returns anundefinedresponseBody. If you typeresponseBodylikelet responseBody: string | ArrayBuffer | object;, it'll highlight the return as an error: - These undefined response bodies being returned without error are causing the pages to be marked as successfully imported, so the next time I run the importer with "Skip previously imported" enabled, it will skip these pages that actually failed to import previously.
-
This line is throwing an error (
May or may not even be bugs:
- We only handle error code 20166 rate-limiting. Is this always returned, or is the error status ever 429 without having a rate-limiting error in the response body? Might make sense to handle both 20166 errors AND status=429?
Fixed by https://github.com/obsidianmd/obsidian-importer/pull/388