Getting started
I've compiled stract via cargo build --release. What do I do next?
How much disk space is required?
I can run the indexer / crawler / scraper via stract indexer, stract crawler and stract autosuggest-scrape.
- do you need to run the crawler first?
I can run the search servers via stract search-server and stract entity-search-server.
I can run the API server via stract api.
Hi! Yea I really need to write a proper getting started guide and provide some data that can bootstrap the index. You can get an idea of how to run the engine after the index has been built by studying the scripts/run_dev.py file and looking at the corresponding config files in configs/.
To build the index you would need to perform the following main steps
- (optionally) run the crawler to crawl some pages and save them in
.warcfiles. The current crawler architecture requires a crawl plan to be built before the crawl can be executed. Commoncrawl distributes a giant dataset of these files, so you can actually totally skip having to run Stracts own crawler, which makes it a lot easier to get started. - build the webgraph using the
stract webgraph createcommand. The config file you want to look at here is located at configs/webgraph/create.toml. - calculate the harmonic centrality for each page/host using
stract centrality. - build the index using
stract indexer search. The config file configs/indexer/create.toml should help get you started.
This should create an index which you can run and execute searches against. Unfortunately I don't have a neat overview of the available fields in each config file, but all of them are defined in crates/core/src/config/mod.rs.
I'll keep this issue open until I have created a proper getting started page.