crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

bug: `setValue` and `useState` functions fail for large objects

Open lhotanok opened this issue 11 months ago • 0 comments

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/core

Issue description

Functions Actor.setValue and Actor.useState / crawler.useState use JSON.stringify function internally to serialize objects. This function cannot handle very large objects and fails with Invalid string length error. This error is already caught and re-thrown in crawlee: https://github.com/apify/crawlee/blob/f912b8b06da2bc4f3f3db508cc39c936a5c87f23/packages/core/src/storages/key_value_store.ts#L31-L36

But it would be better to deal with large objects rather then re-throw the error. There're libraries for big JSON data available, such as:

They're based on stream processing.

The issue can be avoided when using Actor.setValue by stringifying the object before passing it to the setValue function:

import { stringify } from 'big-json';

const stringifiedObj = await stringify({ body: objectToStringify });

await Actor.setValue(
    'KVS_KEY',
    stringifiedObj,
    { contentType: 'application/json' },
);

But useState cannot be used with large objects. It would help if the useState function accepted a callback function as a parameter where the serialization of the state object could be customized.

Steps to reproduce the error:

  • get a large JSON file, e.g. bigFile.json from my js-stringify-examples and parse its content to the JS object obj
    • or generate a large JS object in-memory
  • call JSON.stringify(obj) - the call should fail with Invalid string length error. JSON.stringify can be tested with process.js from js-stringify-examples.
  • call Actor.setValue('KEY', obj) - the call should fail immediately with The "value" parameter cannot be stringified to JSON error
  • call Actor.useState('STATE', obj) - this should fail during the persistation of the state object (not immediately after the call)

Code sample

Actor.setValue('KEY', obj);
Actor.useState('STATE', obj);

Package version

v3.12.1

Node.js version

v22.9.0

Operating system

Tested on both Windows and WSL with Debian distro

Apify platform

  • [ ] Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

lhotanok avatar Jan 16 '25 13:01 lhotanok