LLM-Engineers-Handbook
LLM-Engineers-Handbook copied to clipboard
The pipeline does not include any posts or repositories
Although chapter 4 addresses the implementation of dispatchers and handlers for cleaning, chunking and embedding posts, articles and repositories, the current version of the pipeline only includes articles from different sources.
I guess it must have something to do with Linkedin and Github difficulting to create crawlers as they may protect their endpoints with user and password. But I'd expect the "import data warehouse from JSON" to work without internet connection. In particular, this command:
poetry poe run-import-data-warehouse-from-json
To achieve that, can the corresponding files...
be populated?