obsidian-importer icon indicating copy to clipboard operation
obsidian-importer copied to clipboard

OneNote: Importer ends in half-finished state

Open altano opened this issue 1 year ago • 2 comments

I just took the OneNote import improvements for a test drive hoping that I could now import my notebook[^1] but I got a weird result: the import ended with 82 items remaining, and had 6 errors due to "maximum retry attempts":

Image

It looks like the limit is 5 and that can be exceeded pretty easily: https://github.com/obsidianmd/obsidian-importer/blob/b14b3ae79a278f0402f7bc5bd5f0b5152406aeb5/src/formats/onenote.ts#L16

and I guess 82 items just got lost somehow?

I'd be happy to help debug what's going wrong here if someone could give me a pointer where to look.

[^1]: which was previously hitting lots of rate-limiting errors

altano avatar Mar 02 '25 05:03 altano

@altano Also facing the same rate limit and a need to keep rerunning the importer every 30min to continue where it stopped.

it would be better to make the process continue every predefined time like 30mins without the need to rerun the importer process again and maybe in background .

anelshaer avatar Apr 19 '25 09:04 anelshaer

The OneNote api tells the importer to back off and the importer has code to back off, so there probably just shouldn’t be a max number of retries. Or it should be really high. Then retrying just wouldn’t be needed, which would be better.

The bug that’s causing some items to get stuck in “remaining”) without actually being in a queue is a separate issue. Looks like there’s still some flakiness here.

altano avatar Apr 20 '25 00:04 altano

I debugged this a bit today and noticed a few issues with the onenote importing code:

Low severity:

  • The retry backoff is per-request, so if the onenote API throttles us, every in-flight request keeps retrying. There should be one place the backoff happens, where ALL requests are paused while we're throttled.
  • This Retry-After logic ((+!response.headers.get('Retry-After') * 1000) || 15000;) isn't right: if the header is not in the response you always get back 1000, not 15000: https://github.com/obsidianmd/obsidian-importer/blob/b14b3ae79a278f0402f7bc5bd5f0b5152406aeb5/src/formats/onenote.ts#L890

High severity:

  • MAX_RETRY_ATTEMPTS is being trivially hit because 429 errors are contributing to failed requests. Either MAX_RETRY_ATTEMPTS should be removed or 429 backoff errors should not contribute to the max limit being hit.
  • fetchResource has a codepath that is returning undefined, and I think that is both a mistake and leading to at least two problems. The way this happens is if response.ok is false, we have an inner error object in the response, but it is not code 40001 or 20166. If that happens, we just return an undefined responseBody here: https://github.com/obsidianmd/obsidian-importer/blob/b14b3ae79a278f0402f7bc5bd5f0b5152406aeb5/src/formats/onenote.ts#L900. This causes at least two problems:
    • This line is throwing an error (An internal error occurred while trying to fetch 'https://graph.microsoft.com/v1.0/me/onenote/sections/------------/pages?$select=id%2ctitle%2ccreatedDateTime%2clastModifiedDateTime%2clevel%2corder%2ccontentUrl&$orderby=order&pagelevel=true&$skip=60'. Error details: TypeError: Cannot read properties of undefined (reading 'value')) because fetchResource can return undefined and it's not being checked for. This is possible because fetchResource returns Promise<any> and responseBody is typed as any, which will hide code that doesn't have nullish checks or code that returns an undefined responseBody. If you type responseBody like let responseBody: string | ArrayBuffer | object;, it'll highlight the return as an error: Image
    • These undefined response bodies being returned without error are causing the pages to be marked as successfully imported, so the next time I run the importer with "Skip previously imported" enabled, it will skip these pages that actually failed to import previously.

May or may not even be bugs:

  • We only handle error code 20166 rate-limiting. Is this always returned, or is the error status ever 429 without having a rate-limiting error in the response body? Might make sense to handle both 20166 errors AND status=429?

altano avatar Jun 01 '25 22:06 altano

Fixed by https://github.com/obsidianmd/obsidian-importer/pull/388

altano avatar Jun 18 '25 00:06 altano