crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

ConfigurationOptions refactor

Open moltar opened this issue 1 year ago • 5 comments

Which package is the feature request for? If unsure which one to select, leave blank

@crawlee/core

Feature

ConfigurationOptions interface currently models internal configuration object.

Motivation

When bunding the config options with tsup, which uses dts rollup plugin, bundling fails with:

RollupError: "AsyncLocalStorage" is not exported by "node:async_hooks", imported by "../../node_modules/.pnpm/@[email protected]/node_modules/@crawlee/core/configuration.d.ts".

Technically, this is not even needed for the input config.

Ideal solution or implementation, and any additional constraints

Ideally, there should be another configuration object that models the input configuration from the config file. The main issue is with storageClient and eventManager props which are not possible to define in JSON.

https://crawlee.dev/docs/guides/configuration

And then, I'd also propose that this "input" config be stored in the types package, so that any tools needing types only, would not need to install the entire core.

Alternative solutions or implementations

N/A

Other context

N/A

moltar avatar Nov 15 '23 21:11 moltar

We can surely move the interface to the types package, but I'd like to first understand why, as this is the very first time we are hearing about such problems. We'd also have to also abstract the event manager interface, this needs to stay configurable.

ConfigurationOptions interface currently models internal configuration object.

I don't consider anything there as internal, some options are surely advanced, but not internal.

Ideally, there should be another configuration object that models the input configuration from the config file.

Both storageClient and eventManager are normal options, they are not internal, they will surely stay in there. I can add a base interface that won't have them, but I don't understand why would that be better in any way, as they are already optional. On type level you can use Pick to remove them.

The main issue is with storageClient and eventManager props which are not possible to define in JSON.

Using the JSON file is just one of the possible ways to configure crawlee, indeed some things cannot be configured that way, but I don't think we need a separate interface for this.

And then, I'd also propose that this "input" config be stored in the types package, so that any tools needing types only, would not need to install the entire core.

What's the use case, why would you want to use our types but not the implementation? You either use crawlee, or don't, why would you want to use just the types?

B4nan avatar Nov 22 '23 12:11 B4nan

I don't consider anything there as internal, some options are surely advanced, but not internal.

Ok, fair enough.

What's the use case, why would you want to use our types but not the implementation? You either use crawlee, or don't, why would you want to use just the types?

Let's say you want to create a package, that provides a way to configure a project as code. I need the typings, to provide the type safety. But I don't need to force the user to install the Crawlee package to use it.

A simplified package example:

export configCrawlee(options: ConfigurationOptions) {
  saveFile('crawlee.json', JSON.stringify(options))
}

Now, to get the ConfigurationOptions interface, I need to install the full core package, which is rather heavy.

moltar avatar Nov 22 '23 18:11 moltar

So crawlee is some optional dependency, or why would the user want to configure crawlee when they can't use it?

B4nan avatar Nov 22 '23 18:11 B4nan

Users can use crawlee, but it is not yet available at the time of config production.

Think "starter" package.

The user provides many config options for all kinds of stuff, including tsconfig, package.json, crawlee. Then the project gets minted from user inputs, which creates the package.json from a template, adds crawlee as a dep, and adds crawlee.json.

But at the time of configuration, there are no deps installed, other than the configuration tool.

moltar avatar Nov 22 '23 20:11 moltar

The tool in question is this, btw: https://github.com/projen/projen

moltar avatar Nov 22 '23 20:11 moltar