Getting started

Open yonas opened this issue 1 year ago • 1 comments

I've compiled stract via cargo build --release. What do I do next?

How much disk space is required?

I can run the indexer / crawler / scraper via stract indexer, stract crawler and stract autosuggest-scrape.

do you need to run the crawler first?

I can run the search servers via stract search-server and stract entity-search-server.

I can run the API server via stract api.

Feb 04 '24 22:02 yonas

Hi! Yea I really need to write a proper getting started guide and provide some data that can bootstrap the index. You can get an idea of how to run the engine after the index has been built by studying the scripts/run_dev.py file and looking at the corresponding config files in configs/.

To build the index you would need to perform the following main steps

(optionally) run the crawler to crawl some pages and save them in .warc files. The current crawler architecture requires a crawl plan to be built before the crawl can be executed. Commoncrawl distributes a giant dataset of these files, so you can actually totally skip having to run Stracts own crawler, which makes it a lot easier to get started.
build the webgraph using the stract webgraph create command. The config file you want to look at here is located at configs/webgraph/create.toml.
calculate the harmonic centrality for each page/host using stract centrality.
build the index using stract indexer search. The config file configs/indexer/create.toml should help get you started.

This should create an index which you can run and execute searches against. Unfortunately I don't have a neat overview of the available fields in each config file, but all of them are defined in crates/core/src/config/mod.rs.

I'll keep this issue open until I have created a proper getting started page.

Feb 05 '24 11:02 mikkeldenker