Package ITN orchestrator
A much lighter alternative to/subset of #14798.
Following services need to be configured:
- A few Mina nodes to be used for experiment setup
- 1 seed, 3 bps, 5 regular nodes, 1 snark coordinator
- Uptime backend (configured in in-memory mode, see PR #15625)
- super-light HTTP container
- used for node discovery by testing toolkit
- nodes connect to uptime backend to submit data
- Internal log fetcher
- used to collect "internal" logs
- connects to Mina nodes and fetches logs from them
- 0.5-1 CPU per Mina node
- Some RAM/disk proportional to Mina nodes <- bounded
- Postgresql (with ability to be accessed externally)
- disk proportional to duration of experiment and number of Mina nodes
- 1 TB is a good start
- reachable by "admin" and log fetcher
- VS' Dashboard (optional)
- web app that connects to Mina nodes
- super-light
- Orchestrator
- light custom container running some Go application (4 CPU?)
- option A (milti-experiment setup): a server with an SSH access that allows "admin" to connect and start new experiments without touching any other resources in deployment
- option B (one-experiment setup): a server that runs a given script and finishes after
- Snark worker pool <- most CPU heavy component
- preferably with as few layers of virtualization as possible
- a small number of high-CPU machines that do not use virtualization
- preferably not multi-container, but multi-process (static number of worker per server)
- initially k8s/docker setup is fine
Communication diagram: https://docs.google.com/presentation/d/17ADhlWaH9FyzAI3G4urIwA8gbBVw48Niio0tZ57EEkc
Related issue: https://app.zenhub.com/workspaces/platform-engineering-652fc5b78bd6b33584ffedf9/issues/zh/103
first step George <> Luis, deploy cluster to run Mina node,
Services from docker compose files (using version 2)
Uptime backend
uptime-backend:
image: gcr.io/o1labs-192920/delegation-backend:2.1.0
container_name: uptime-backend
restart: always
ports:
- 6000:8080
volumes:
- ./config:/config
environment:
NETWORK: testnet
CONFIG_FILE: /config/conf.json
Where conf.json is the following, with public keys to be list of public keys corresponding to what we pass via --uptime-submitter-key to Mina nodes:
{
"in_memory": true,
"whitelist": [
"B62qkasW9RRENzCzdEov1PRQ63BUT2VQK9iU7imcvbPLThnhL2eYMz8",
"B62qp3x5osG6Fz6j44FVn61E4DNpAnyDEMcoQdNQZAdhaR7sj4wZ6gW",
"B62qii4xfjQ3Vg5dsq7RakYTENQkdD8pFPMgqtUdC9FhgvBbwEbRoML"
]
}
Internal log fetcher
Log fetcher is an application that requests list of active Mina nodes from uptime-backend and launches a log consumer process per Mina node to fetch logs. These logs are then submitted to postgresql. Some logs are being kept on the server (with a limit of total space used per Mina node).
Log fetcher is configured with a secret key file (which contains base64-encoded Ed25519 secret key).
internal-log-fetcher:
image: gcr.io/o1labs-192920/mina-internal-trace-consumer:1.3.0
container_name: internal-log-fetcher
restart: always
command: "fetcher -k /keys/secret_key -o /output --db-uri 'postgresql://postgres:secret_password_12345@postgres:5432' discovery"
ports:
- 4000:4000
- 11000-11700:11000-11700
volumes:
- ./keys:/keys
- ./output:/output
- ./names-data:/names-data
environment:
INTERNAL_TRACE_CONSUMER_EXE: /internal_trace_consumer
FETCH_INTERVAL_MS: 10000
ONLINE_URL: "http://itn.openmina.com:6000/v1/online" # URL exposed by uptime-backend
Postgresql instance
Special configurations are passed to sustain simultaneous connections for log fetchers launched to support ~200 Mina nodes.
postgres:
image: postgres
shm_size: 1g
container_name: postgres
restart: always
ports:
- 5455:5432
command: "-c max_connections=10000 -c shared_buffers=2048MB"
volumes:
- ./postgresql:/var/lib/postgresql/data
environment:
PGDATA: /var/lib/postgresql/data/pgdata
POSTGRES_PASSWORD: secret_password_12345
VS dashboard (optional)
For the initial version, let's skip it.
frontend:
image: directcuteo/mina-frontend:663f692
container_name: frontend
restart: always
ports:
- 80:80
command:
- sh
- -ce
- |
ENV=$(cat /fe-config.json | tr -d '\n' | tr -s ' ' | sed -e 's/ //g') envsubst < /usr/share/nginx/html/assets/env.template.js > /usr/share/nginx/html/assets/env.js exec nginx -g 'daemon off;'
volumes:
- ./fe-config.json:/fe-config.json
networks:
- internal-log-fetcher-network
Special configuration options for Mina node
Experiment-specific flags
-
--itn-graphql-port 11111- any port to be used, not necessary the same among Mina nodes
- it's a port which Orchestrator and Internal log fetcher will use to contact Mina nodes
-
--itn-keys 'gSsHGULVizunF+UC6z2p9/rJwLSwb4Y/wHYreSHUY+I='- comma-separated list of Ed25519 public keys corresponding to secret keys used by Orchestrator and Internal log fetcher
Uptime system flags
-
--uptime-submitter-key <..filepath..>- key used to submit data to uptime system (same key schema as is used for block production)
-
--uptime-url <..url..>- URL exposed by the uptime backend
Orchestrator configuration
Unlike other components, orchestrator is not a service, but a long-running job.
Job takes the following:
- A script and a set of keys is uploaded
- keys are specific to the ledger of the network, and are essentially reusable from one network to the other
- script is fully agnostic of the network, hence 100% reusable
- Orchestrator configuration file
- includes some stuff specific to the current test network and some stuff that will be the same from one network to other
Job is launched via command:
./orchestrator orchestrator_config.json < a.script
It outputs logs to stderr and stdout which have different semantical meaning. E.g. logs from stdout may be used to immediately stop a job that was started on Mina nodes (just stopping the orchestrator isn't sufficient). Logs that are printed to stderr are also directed to a logfile.
Keys are listed in the script file and are to be put in location relative to the directory from which the orchestrator job is started.
Example of configuration file:
{
"key": "2Dtcua6w9g8JZczc/D6laz6Yn1ZP7DVGCmHfFDxGupY=",
"slotDurationMs": 180000,
"genesisTimestamp": "2024-02-09T13:20:00+00:00",
"onlineURL": "http://itn.openmina.com:6000/v1/online",
"fundDaemonPorts": [
"10.233.66.157:8301",
"10.233.104.122:8301",
"10.233.77.94:8301",
"10.233.109.201:8301",
"10.233.92.109:8301",
"10.233.79.75:8301",
"10.233.106.244:8301",
"10.233.84.161:8301",
"10.233.99.215:8301",
"10.233.71.250:8301",
"10.233.118.203:8301",
"10.233.72.216:8301"
],
"logFile": "orchestrator.log"
}
Script prescribes all of the manipulations to be executed with the Mina nodes. After the last step of script is executed, orchestrator finishes. There aren't any files to be shared between orchestrator executions (no cache/DBs whatsoever).
Special configuration options for Mina node
Experiment-specific flags
--itn-graphql-port 11111
- any port to be used, not necessary the same among Mina nodes
- it's a port which Orchestrator and Internal log fetcher will use to contact Mina nodes
--itn-keys 'gSsHGULVizunF+UC6z2p9/rJwLSwb4Y/wHYreSHUY+I='
- comma-separated list of Ed25519 public keys corresponding to secret keys used by Orchestrator and Internal log fetcher
Uptime system flags
--uptime-submitter-key <..filepath..>
- key used to submit data to uptime system (same key schema as is used for block production)
--uptime-url <..url..>
- URL exposed by the uptime backend
There are more details needed here for me to complete the deployment. For instance:
- What are the Docker images from the running nodes? Would
devnetwork? We are currently usinggcr.io/o1labs-192920/mina-daemon:3.0.1-alpha1-release-3.0.1-0473756-bullseye-devnet - These
--itn-**flags are not available in themina daemoncommand. Am I missing something? @georgeee
Hi @SanabriaRusso!
Good catch.
We need to investigate this, flags should be available for the devnet build. If they aren't, this is an issue.
We specifically disabled ITN in compile config in the runup to Berkeley release to prevent accidental misusage.
I prepared a branch georgeee/compatible-with-itn with itn_feautures: true in devnet.mlh to rollback this change.
https://gist.github.com/georgeee/25dfaf41948f3409ba35e23886e170f2 sql schema
Major components were extracted into the following repository: https://github.com/o1-labs/mina-perf-testing
The fetcher-infra-tmp remains until deployment migration to the standardised form (k8s).