apify-sdk-js
apify-sdk-js copied to clipboard
feat: automatically initialize the actor class when calling methods that needs it
I was debating if I should add the init options to all the methods that now call this.init(), but at the same time I feel like it'd be better if people explicitly called Actor.init for non-default storage methods
What's the backstory? :)
We're kinda pushing it everywhere (guides/docs/etc) that Actor.init()/exit()
are the only two needed calls if you wanna run your code as an actor. With this - Actor.init()
kinda not needed anymore, right? Actor.exit()
still is though.
I would maybe keep it consistent (so keep both init/exit) and rather maybe print a warning/throw an error instead? Because init would still be needed if you would want to use e.g. other storage implementation. Plus i am pretty sure a lot of people would still use Actor.main
.
So I don't really have a strong opinion on this, but I am already used to having init()/exit() =D
What's the backstory? :)
Not fully sure, I just went off of what I saw in slack. I might need some more clarification, or maybe this should only apply to Actor.getInput? Or maybe I completely misunderstood it
Can you send the link to Slack? :) Either I missed something, or I am just totally out of sync =D
I agree with @AndreyBykov but not strong feeling about this one either :)
The backstory is that if you use top level await on the Actor methods in a different file than the main.js where you call Actor.init, it will fail silently, because the apify env vars are not yet supported - because the Actor.init is not called yet.
(Will check the code tomorrow, this comment is not based on it)
@B4nan but it's gonna be the same even with one/same file, no? The input would just be an empty object to be precise (just tried to run it for the same/for the different files without Actor.init()). What I am trying to say is that Actor.init()
should pretty much be called at the beginning of the whole run, but after init - you could use it in any file. I am either missing something, or you had some edgy edge case :)
Oh - is this related to this? Like INPUT.json in the root? I kind of never even had an idea to use it this way as I am used to have it in KV-store. But ok - I get it for Crawlee, but should it be implemented for Actor
? For local usage only?
I dont follow what you mean. You simply cant call Actor.getInput in other than main.js file (where you call Actor.init). That is very error prone, especially now with top level await available.
And yes, that root input.json shortcut definitely needs to be supported with Actor.getInput, thats the main goal. But its a separate and unrelated issue.
Ermm, I do it in ABC scrapers. In main
has something like:
// ...
await Actor.init();
const { zipCodes, proxyOpts } = await fetchInput();
// ...
fetchInput()
here is a function defined in a different file and imported from tools.ts
.
In tools in this function I am calling await Actor.getInput()
inside and it works perfectly fine.
What am I doing wrong?
If you mean calling Actor.getInput()
on top level in different file. Well - why would you even do it? =D
If you mean calling Actor.getInput() on top level in different file. Well - why would you even do it? =D
Exactly that. Why would i put such call inside some request handler, when i know there is only one input for the run? It can never be request specific.
You think about this with years of experience with CJS and old sdk. I ran into this problem when writing the very first realworld actor with crawlee. I can assure you, people would fall into this crap very easily.
Well - it's not really in the request handler, it's before the crawler, but yeah, got it. But what's the further usage? Would you e.g. get input somewhere else and let's say export it or what? Or you would e.g. get input in the router
?
Funnily enough - I was having trouble with it another way around. I am used to get input, and then use it in crawler requestHandler, but now with router I was like - hmm - so how do I pass it to routes now considering I get it in main 😄
Not reexport, simply use it there. Why would I get the input from main.js and pass it to the routes somehow, when I can read it from the other file instead.
Well - It's kinda faster to read it once and then use it from the memory. E.g. you have proxyConfiguration
(which you use in crawler
in main
) and some other stuff that you use in routes
(which did not exist before, so I would further pass it to some imported function).
I get it that you would probably just read it here and there like twice, but I dunno, I did not like the idea of reading it at each point from outside of the app when I need it.
I did not needed to read it in main.js at all, thats the part you are probably missing.