hero icon indicating copy to clipboard operation
hero copied to clipboard

Unable to convert PDF files to arrayBuffer using Response.arrayBuffer().

Open Cmeesh11 opened this issue 1 year ago • 6 comments

Using hero.fetch, I'm able to properly retrieve a pdf file from a site. I want to convert this to a buffer so I can save it, but I get this error every time:

2024-06-14T17:21:25.757Z ERROR [hero-core/connections/ConnectionToHeroClient] ConnectionToClient.HandleRequestError {
  context: {},
  sessionId: 'ErM0QZuIvKRJrfdV2GwaK',
  sessionName: undefined
} InjectedScriptError: InvalidCharacterError: Failed to execute 'btoa' on 'Window': The string to be encoded contains characters outside of the Latin1 range.
    at JsPath.runJsPath (/Users/cartermichaud/Development/PortalIntegrations/talon-portal-integrations-hero/agent/main/lib/JsPath.ts:165:13)
    at async FrameEnvironment.execJsPath (/Users/cartermichaud/Development/PortalIntegrations/talon-portal-integrations-hero/node_modules/core/lib/FrameEnvironment.ts:246:12)
    at async CommandRecorder.runCommandFn (/Users/cartermichaud/Development/PortalIntegrations/talon-portal-integrations-hero/node_modules/core/lib/CommandRecorder.ts:90:16)
    at async CommandRunner.runFn (/Users/cartermichaud/Development/PortalIntegrations/talon-portal-integrations-hero/node_modules/core/lib/CommandRunner.ts:36:14)
    at async ConnectionToHeroClient.executeCommand (/Users/cartermichaud/Development/node_modules/core/connections/ConnectionToHeroClient.ts:258:12)
    at async ConnectionToHeroClient.handleRequest (/Users/cartermichaud/Development/node_modules/core/connections/ConnectionToHeroClient.ts:66:14) {
  pathState: { step: [ 'arrayBuffer' ], index: 2 }
}
CLAIMS ERROR: InjectedScriptError: InvalidCharacterError: Failed to execute 'btoa' on 'Window': The string to be encoded contains characters outside of the Latin1 range.

Here is the request I'm making:

const res = await hero.fetch( url, {
  "method" : "get",
  "headers" : {
    "Accept" : "application/pdf",
    "Authorization" : `Bearer ${this.portal.authToken}`
  },
  "credentials" : "include"
} );

const buffer = await res.arrayBuffer(); // Throws error here

Cmeesh11 avatar Jun 14 '24 17:06 Cmeesh11

Thanks! I think I see issue in the dom type serializer layer. Btoa apparently only works with ascii characters.

blakebyrnes avatar Jun 17 '24 13:06 blakebyrnes

Exactly, and I believe pdf files are all going to have non ascii characters, so makes sense!

Cmeesh11 avatar Jun 17 '24 13:06 Cmeesh11

Well, then I went into code and we're doing this already :

const binary = Array.from(new Uint8Array(value.buffer, value.byteOffset, value.byteLength)) .map(byte => String.fromCharCode(byte)) .join('');

I think we had to move away from the String.fromCharCode(..byte) because the varargs breaks at some size of binary. Need to figure out what to replace this with.

blakebyrnes avatar Jun 17 '24 13:06 blakebyrnes

If the varargs limit is an issue, I believe TextDecoder works pretty well and is designed to handle larger arrays. You could do something like:

const binary = new Uint8Array(value.buffer, value.byteOffset, value.byteLength);
const decodedString = new TextDecoder('utf-8').decode(binary);

But just a suggestion.

Cmeesh11 avatar Jun 17 '24 13:06 Cmeesh11

Good suggestion. I was exploring that as well. I think (you might try to modify your node_modules to check) that your binary is encoded in latin1. If that's the case, this will probably break your encoding, so you'd end up with something weird like:

 let decodedString;
  try {
    // Attempt to decode using UTF-8
    decodedString = new TextDecoder('utf-8').decode(binary);
  } catch (e) {
    // Fallback to Latin-1 if UTF-8 decoding fails
    decodedString = new TextDecoder('latin1').decode(binary);
  }

I think we could probably also avoid variadic by doing

const dataArray = Array.from(new Uint8Array(value.buffer, value.byteOffset, value.byteLength));
const binary = String.fromCharCode.apply(null, dataArray);

This would be in TypeSerializer in @ulixee/commons in node modules

blakebyrnes avatar Jun 17 '24 13:06 blakebyrnes

Realized I hadn't checked in a fix for this to the commons project. It's in there now if you want to try it out. Or you can wait for next release

blakebyrnes avatar Jun 19 '24 00:06 blakebyrnes

Should be fixed in release 29

blakebyrnes avatar Jul 16 '24 15:07 blakebyrnes