Problem

The Apify platform supports non-default unnamed storages. This functionality is also available in the Apify Python client, where you can do the following (example for dataset):

await DatasetCollectionClientAsync.get_or_create()

Each call creates a new, unnamed dataset with a unique ID.

In contrast, Crawlee does not support this (in any storage client). For example, repeated calls to:

await Dataset.open()

always return the same default unnamed storage.

Goal state

Achieve feature parity between Crawlee storages (all storage clients, including the ApifyStorageClient) and the Apify platform (API client) by adding support for non-default unnamed storages.

Possible solution

Introduce a new argument to the storage open constructor:

async def open(
    cls,
    name: str | None = None,
    id: str | None = None,
    scope: Literal['run', 'global'] = 'global',
) -> Dataset | KeyValueStore | RequestQueue:
    ...

scope='run' indicates a non-default unnamed storage.
scope='global' refers to globally named storages.
The name parameter cannot be entirely removed for run scope storages, as it's needed:
- For the filesystem storage: to use as a directory name.
- For Apify platform storage: to store the mapping of name -> ID in the default key-value store.

Behavior matrix...

Open storage by ID and name

Raise an exception.
Scope argument is ignored.

Open storage by ID

Opens an existing storage by ID.
Scope?

Open storage by name

Scope run:
- Opens or creates a run-scope (non-default unnamed) storage.
  - name is used internally for reference-storage purposes but is not the actual storage's "name".
Scope global:
- Opens or creates a global named storage.

Open storage without args

Opens the default unnamed storage.
Scope argument is ignored.

Apr 25 '25 16:04 vdusek

When opening the storage by ID, the scope does not make sense. I think an exception would be appropriate.

Apr 25 '25 21:04 janbuchar

We should ask for more feedback, e.g. on slack

May 12 '25 09:05 B4nan

So to create a new persisted unnamed dataset, you would call Dataset.open(name='debug', scope='run') and then every time you call this (even after migration), it would return the same dataset, right?

Before releasing, I would have a short sync with the platform/output schema. There is e.g. this proposal so to make sure we don't use completely different terms https://github.com/apify/actor-whitepaper/pull/25

Jun 09 '25 11:06 metalwarrior665

So to create a new persisted unnamed dataset, you would call Dataset.open(name='debug', scope='run') and then every time you call this (even after migration), it would return the same dataset, right?

Yes, spot on. With the caveat that Dataset.open(name='debug') will open a different, global dataset. Perhaps we could just throw if multiple open calls share the same name but use different scopes.

Before releasing, I would have a short sync with the platform/output schema. There is e.g. this proposal so to make sure we don't use completely different terms apify/actor-whitepaper#25

The suggested implementation won't need any support from the platform side, but it's always a good idea to sync on terminology.

Jun 09 '25 12:06 janbuchar

Overview of the variants from the user experience perspective...

1) Scope version

Scope global remains the default option.

Direct usage

Same for all types of storages.

# Default dataset
default_dataset = await Dataset.open()

# Run scope dataset
dataset_run_scope = await Dataset.open(name='dataset_run_scope', scope='run')

# Global scope dataset
dataset_global_scope = await Dataset.open(name='dataset_global_scope', scope='global')

# And then all the methods remain the same…

Context helpers

Helpers on the crawling context.

Push data

# Default dataset
await context.push_data(data)

# Global scope dataset
await context.push_data(data, dataset_name='dataset_global_scope', scope='global')

# Run scope dataset
await context.push_data(data, dataset_name='dataset_run_scope', scope='run')

Add requests

Currently, there is no option for specifying the destination (always the default RQ); but we can add it.

# Default RQ
await context.add_requests(requests)

# Global scope RQ
await context.add_requests(requests, rq_name='rq_global_scope', scope='global')

# Run scope RQ
await context.add_requests(requests, rq_name='rq_run_scope', scope='run')

Enqueue links

Currently, there is no option for specifying the destination (always the default RQ); but we can add it.

# Default RQ
await context.enqueue_links()

# Global scope RQ
await context.enqueue_links(rq_name='rq_global_scope', scope='global')

# Run scope RQ
await context.enqueue_links(rq_name='rq_run_scope', scope='run')

Get KVS

# Default KVS
kvs = await context.get_key_value_store()

# Global scope KVS
kvs_global = await context.get_key_value_store(name='kvs_global_scope', scope='global')

# Run scope KVS
kvs_run = await context.get_key_value_store(name='kvs_run_scope', scope='run')

Use state

Should always use the default one.

Crawler helpers

Helpers on the crawlers.

Export data

# Default dataset
data = await crawler.export_data()

# Global scope dataset
data = await crawler.export_data(dataset_name='dataset_global_scope', scope='global')

# Run scope dataset
data = await crawler.export_data(dataset_name='dataset_run_scope', scope='run')

Get dataset

# Default dataset
dataset = await crawler.get_dataset()

# Global scope dataset
dataset = await crawler.get_dataset(name='dataset_global_scope', scope='global')

# Run scope dataset
dataset = await crawler.get_dataset(name='dataset_run_scope', scope='run')

Get key value store

# Default KVS
kvs = await crawler.get_key_value_store()

# Global scope KVS
kvs = await crawler.get_key_value_store(name='kvs_global_scope', scope='global')

# Run scope KVS
kvs = await crawler.get_key_value_store(name='kvs_run_scope', scope='run')

Get request manager

It returns the configured request manager, so it is not affected.

Add requests

It uses the underlying request manager, so it is not affected.

2) Alias version

Direct usage

Same for all types of storages.

# Default dataset
default_dataset = await Dataset.open()

# Global scope dataset (name)
dataset_global_scope = await Dataset.open(name='dataset_global_scope')

# Run scope dataset (alias)
dataset_run_scope = await Dataset.open(alias='dataset_run_scope')

# And then all the methods remain the same…

Context helpers

Helpers on the crawling context.

Push data

# Default dataset
await context.push_data(data)

# Global scope dataset (name)
await context.push_data(data, dataset_name='dataset_global_scope')

# Run scope dataset (alias)
await context.push_data(data, dataset_alias='dataset_run_scope')

Add requests

Currently, there is no option for specifying the destination (always the default RQ); but we can add it.

# Default RQ
await context.add_requests(requests)

# Global scope RQ (name)
await context.add_requests(requests, rq_name='rq_global_scope')

# Run scope RQ (alias)
await context.add_requests(requests, rq_alias='rq_run_scope')

Enqueue links

Currently, there is no option for specifying the destination (always the default RQ); but we can add it.

# Default RQ
await context.enqueue_links()

# Global scope RQ (name)
await context.enqueue_links(rq_name='rq_global_scope')

# Run scope RQ (alias)
await context.enqueue_links(rq_alias='rq_run_scope')

Get KVS

# Default KVS
kvs = await context.get_key_value_store()

# Global scope KVS (name)
kvs_global = await context.get_key_value_store(name='kvs_global_scope')

# Run scope KVS (alias)
kvs_run = await context.get_key_value_store(alias='kvs_run_scope')

Use state

Should always use the default one.

Crawler helpers

Helpers on the crawlers.

Export data

# Default dataset
data = await crawler.export_data()

# Global scope dataset (name)
data = await crawler.export_data(dataset_name='dataset_global_scope')

# Run scope dataset (alias)
data = await crawler.export_data(dataset_alias='dataset_run_scope')

Get dataset

# Default dataset
dataset = await crawler.get_dataset()

# Global scope dataset (name)
dataset = await crawler.get_dataset(name='dataset_global_scope')

# Run scope dataset (alias)
dataset = await crawler.get_dataset(alias='dataset_run_scope')

Get key value store

# Default KVS
kvs = await crawler.get_key_value_store()

# Global scope KVS (name)
kvs = await crawler.get_key_value_store(name='kvs_global_scope')

# Run scope KVS (alias)
kvs = await crawler.get_key_value_store(alias='kvs_run_scope')

Get request manager

It returns the configured request manager, so it is not affected.

Add requests

It uses the underlying request manager, so it is not affected.

Sep 02 '25 10:09 vdusek

Add support for non-default unnamed storages

Problem

Goal state

Possible solution

Open storage by ID and name

Open storage by ID

Open storage by name

Open storage without args

1) Scope version

Direct usage

Context helpers

Push data

Add requests

Enqueue links

Get KVS

Use state

Crawler helpers

Export data

Get dataset

Get key value store

Get request manager

Add requests

2) Alias version

Direct usage

Context helpers

Push data

Add requests

Enqueue links

Get KVS

Use state

Crawler helpers

Export data

Get dataset

Get key value store

Get request manager

Add requests