docs(contributing): add setup instructions
Summary
This PR adds detailed setup and build instructions to help contributors initialize the Crawlee project locally. It documents required dependencies, Yarn installation via Corepack, and guidance on using yarn build successfully.
Key Changes
Documentation Additions:
- Added Crawlee Project Pre-requisites section listing required Node.js and Yarn versions.
- Included a Crawlee Installation and Building guide with Corepack instructions and
yarncommands.
Important Notes:
- These updates aim to streamline the developer onboarding and build process.
Contributors:
- Salvador Nunez: @SalvadorN323
- Alexander Manalad: @axmanalad
- Bao Truong: @baotruong04
this sounds like something to fix instead of documenting it. can you describe what exactly did happened?
rimraf distshould work just fine even if the folder is not present, our CI would fail if that would be the case.
Hi @B4nan,
Thanks for reviewing! Each of us ran into a similar build issue when we attempted to rebuild the project with any change in general with an error related to gen-esm-wrapper not finding the index.js in the dist folder. For instance, if I only insert a console.log(“Hello World"); line inside the core package TS file (like enqueue_links.ts), running yarn build in the root project would miss the core build cache and either never build the dist folder or it would be incomplete.
I also thought that rimraf ./dist should work as expected, since it does delete it in the frontend. I believe it had to do something with the direction of the path, rather yarn compile points to the deleted dist folder possibly? With this in mind, it can also mean that rimraf ./dist does not delete the dist folder fully during compile time?
However, another fix I found working but would always include more steps was the the following:
- When receiving the error, change the current directory into the directory of the error occurring (e.g. packages/core)
- Run
yarn build - Change the directory back to the root project.
- Run
yarn cleanandyarn build.
I could always ticket a new issue with the error log included if you are interested. When we found the rimraf fix, we did not know whether to include it as a potential change in the codebase.
Thanks for reviewing! Each of us ran into a similar build issue when we attempted to rebuild the project with any change in general, with an error related to gen-esm-wrapper not finding the index.js in the dist folder. For instance, if I only insert a console.log(“Hello World"); line inside the core package TS file (like enqueue_links.ts), running yarn build in the root project would miss the core build cache and either never build the dist folder or it would be incomplete.
This feels like something weird happened on your end, and you are trying to randomly find the culprit (so which one is it, gen-esm-wrapper, tsc build, build not working at all, or being incomplete?). I kinda doubt there is an issue like this (if there is, it would have to be in one of the libraries like tsc or turbo).
I'd need to see a complete reproduction - exact steps, not "either that or that happened, or maybe that". Right now, I am not convinced we need to update the contributing guide. Your changes there could likely confuse people rather than help them.
Reading this again and again, I actually think I know what is happening to you, it sounds the tsc build cache, which wasn't properly ignored some time ago (and we managed to include one tsbuildInfo file in the git). We fixed that already via #3035, maybe you just faced that issue because you cloned the project earlier.
I updated the documentation to not include the fix.
This feels like something weird happened on your end, and you are trying to randomly find the culprit (so which one is it, gen-esm-wrapper, tsc build, build not working at all, or being incomplete?). I kinda doubt there is an issue like this (if there is, it would have to be in one of the libraries like
tscorturbo).
Reading this again and again, I actually think I know what is happening to you, it sounds the tsc build cache, which wasn't properly ignored some time ago (and we managed to include one
tsbuildInfofile in the git). We fixed that already via #3035, maybe you just faced that issue because you cloned the project earlier.
The project was tested and cloned after the fix you mentioned. Even attempting to run the normal steps would result with the same error either way. The normal steps with a fresh start would include:
- Had
corepack enableset up. - Run
yarn install - Run
yarn build(success) - Add
console.log("Hello World");in line 520 ofenqueue_links.tsin the core package. - Run
yarn buildwith the new code change (fails)
You are correct however that it is tied to a local issue of mine as I tried many checks of the following:
- Uninstalled my global versionings of TypeScript and Turbo.
- Every versioning including Node.js and Yarn are correct.
- Using
yarn cleanto clear cache. - Deleting
node_modulesand reinstalling withyarn install - Deleting the generated
tsconfig.build.tsbuildinfomanually (somehow this works)
I once again attempted today to do the normal steps of rebuilding the project. You are also correct that it has to do something with TypeScript's incremental build cache in my local environment; it has to do something with tsconfig.build.tsbuildinfo being out of sync or being corrupted afterwards? In other words, tsconfig.build.tsbuildinfo is not updating for me whenever I make a new build with new code changes weird enough, which forces me to delete it manually. Unfortunately, I am not sure where the source of the problem is regarding the "out of sync" issue as it is somehow not an easy fix to become automatic locally. If you would like to look into the log however, feel free to do so with the document I attached.
log.txt