mina icon indicating copy to clipboard operation
mina copied to clipboard

Package ITN orchestrator

Open georgeee opened this issue 1 year ago • 7 comments

A much lighter alternative to/subset of #14798.

Following services need to be configured:

  1. A few Mina nodes to be used for experiment setup
    • 1 seed, 3 bps, 5 regular nodes, 1 snark coordinator
  2. Uptime backend (configured in in-memory mode, see PR #15625)
    • super-light HTTP container
    • used for node discovery by testing toolkit
    • nodes connect to uptime backend to submit data
  3. Internal log fetcher
    • used to collect "internal" logs
    • connects to Mina nodes and fetches logs from them
    • 0.5-1 CPU per Mina node
    • Some RAM/disk proportional to Mina nodes <- bounded
  4. Postgresql (with ability to be accessed externally)
    • disk proportional to duration of experiment and number of Mina nodes
    • 1 TB is a good start
    • reachable by "admin" and log fetcher
  5. VS' Dashboard (optional)
    • web app that connects to Mina nodes
    • super-light
  6. Orchestrator
    • light custom container running some Go application (4 CPU?)
    • option A (milti-experiment setup): a server with an SSH access that allows "admin" to connect and start new experiments without touching any other resources in deployment
    • option B (one-experiment setup): a server that runs a given script and finishes after
  7. Snark worker pool <- most CPU heavy component
    • preferably with as few layers of virtualization as possible
    • a small number of high-CPU machines that do not use virtualization
    • preferably not multi-container, but multi-process (static number of worker per server)
    • initially k8s/docker setup is fine

Communication diagram: https://docs.google.com/presentation/d/17ADhlWaH9FyzAI3G4urIwA8gbBVw48Niio0tZ57EEkc

georgeee avatar Jun 20 '24 16:06 georgeee

Related issue: https://app.zenhub.com/workspaces/platform-engineering-652fc5b78bd6b33584ffedf9/issues/zh/103

georgeee avatar Jul 18 '24 17:07 georgeee

first step George <> Luis, deploy cluster to run Mina node,

submarinec94 avatar Jul 18 '24 17:07 submarinec94

Services from docker compose files (using version 2)

Uptime backend

  uptime-backend:
    image: gcr.io/o1labs-192920/delegation-backend:2.1.0
    container_name: uptime-backend
    restart: always
    ports:
      - 6000:8080
    volumes:
      - ./config:/config
    environment:
      NETWORK: testnet
      CONFIG_FILE: /config/conf.json

Where conf.json is the following, with public keys to be list of public keys corresponding to what we pass via --uptime-submitter-key to Mina nodes:

{
  "in_memory": true,
  "whitelist": [
    "B62qkasW9RRENzCzdEov1PRQ63BUT2VQK9iU7imcvbPLThnhL2eYMz8",
    "B62qp3x5osG6Fz6j44FVn61E4DNpAnyDEMcoQdNQZAdhaR7sj4wZ6gW",
    "B62qii4xfjQ3Vg5dsq7RakYTENQkdD8pFPMgqtUdC9FhgvBbwEbRoML"
  ]
}

Internal log fetcher

Log fetcher is an application that requests list of active Mina nodes from uptime-backend and launches a log consumer process per Mina node to fetch logs. These logs are then submitted to postgresql. Some logs are being kept on the server (with a limit of total space used per Mina node).

Log fetcher is configured with a secret key file (which contains base64-encoded Ed25519 secret key).

  internal-log-fetcher:
    image: gcr.io/o1labs-192920/mina-internal-trace-consumer:1.3.0
    container_name: internal-log-fetcher
    restart: always
    command: "fetcher -k /keys/secret_key -o /output --db-uri 'postgresql://postgres:secret_password_12345@postgres:5432' discovery"
    ports:
      - 4000:4000
      - 11000-11700:11000-11700
    volumes:
      - ./keys:/keys
      - ./output:/output
      - ./names-data:/names-data
    environment:
      INTERNAL_TRACE_CONSUMER_EXE: /internal_trace_consumer
      FETCH_INTERVAL_MS: 10000
      ONLINE_URL: "http://itn.openmina.com:6000/v1/online" # URL exposed by uptime-backend

Postgresql instance

Special configurations are passed to sustain simultaneous connections for log fetchers launched to support ~200 Mina nodes.

  postgres:
    image: postgres
    shm_size: 1g
    container_name: postgres
    restart: always
    ports:
      - 5455:5432
    command: "-c max_connections=10000 -c shared_buffers=2048MB"
    volumes:
      - ./postgresql:/var/lib/postgresql/data
    environment:
      PGDATA: /var/lib/postgresql/data/pgdata
      POSTGRES_PASSWORD: secret_password_12345

VS dashboard (optional)

For the initial version, let's skip it.

  frontend:
    image: directcuteo/mina-frontend:663f692
    container_name: frontend
    restart: always
    ports:
      - 80:80
    command:
      - sh
      - -ce
      - |
        ENV=$(cat /fe-config.json | tr -d '\n' | tr -s ' ' | sed -e 's/ //g') envsubst < /usr/share/nginx/html/assets/env.template.js > /usr/share/nginx/html/assets/env.js                                                                        exec nginx -g 'daemon off;'
    volumes:
      - ./fe-config.json:/fe-config.json
    networks:
      - internal-log-fetcher-network

georgeee avatar Jul 22 '24 11:07 georgeee

Special configuration options for Mina node

Experiment-specific flags

  • --itn-graphql-port 11111
    • any port to be used, not necessary the same among Mina nodes
    • it's a port which Orchestrator and Internal log fetcher will use to contact Mina nodes
  • --itn-keys 'gSsHGULVizunF+UC6z2p9/rJwLSwb4Y/wHYreSHUY+I='
    • comma-separated list of Ed25519 public keys corresponding to secret keys used by Orchestrator and Internal log fetcher

Uptime system flags

  • --uptime-submitter-key <..filepath..>
    • key used to submit data to uptime system (same key schema as is used for block production)
  • --uptime-url <..url..>
    • URL exposed by the uptime backend

georgeee avatar Jul 22 '24 11:07 georgeee

Orchestrator configuration

Unlike other components, orchestrator is not a service, but a long-running job.

Job takes the following:

  • A script and a set of keys is uploaded
    • keys are specific to the ledger of the network, and are essentially reusable from one network to the other
    • script is fully agnostic of the network, hence 100% reusable
  • Orchestrator configuration file
    • includes some stuff specific to the current test network and some stuff that will be the same from one network to other

Job is launched via command:

./orchestrator orchestrator_config.json < a.script

It outputs logs to stderr and stdout which have different semantical meaning. E.g. logs from stdout may be used to immediately stop a job that was started on Mina nodes (just stopping the orchestrator isn't sufficient). Logs that are printed to stderr are also directed to a logfile.

Keys are listed in the script file and are to be put in location relative to the directory from which the orchestrator job is started.

Example of configuration file:

{
  "key": "2Dtcua6w9g8JZczc/D6laz6Yn1ZP7DVGCmHfFDxGupY=",
  "slotDurationMs": 180000,
  "genesisTimestamp": "2024-02-09T13:20:00+00:00",
  "onlineURL": "http://itn.openmina.com:6000/v1/online",
  "fundDaemonPorts": [
    "10.233.66.157:8301",
    "10.233.104.122:8301",
    "10.233.77.94:8301",
    "10.233.109.201:8301",
    "10.233.92.109:8301",
    "10.233.79.75:8301",
    "10.233.106.244:8301",
    "10.233.84.161:8301",
    "10.233.99.215:8301",
    "10.233.71.250:8301",
    "10.233.118.203:8301",
    "10.233.72.216:8301"
  ],
  "logFile": "orchestrator.log"
}

Script prescribes all of the manipulations to be executed with the Mina nodes. After the last step of script is executed, orchestrator finishes. There aren't any files to be shared between orchestrator executions (no cache/DBs whatsoever).

georgeee avatar Jul 22 '24 11:07 georgeee

Special configuration options for Mina node

Experiment-specific flags

  • --itn-graphql-port 11111

    • any port to be used, not necessary the same among Mina nodes
    • it's a port which Orchestrator and Internal log fetcher will use to contact Mina nodes
  • --itn-keys 'gSsHGULVizunF+UC6z2p9/rJwLSwb4Y/wHYreSHUY+I='

    • comma-separated list of Ed25519 public keys corresponding to secret keys used by Orchestrator and Internal log fetcher

Uptime system flags

  • --uptime-submitter-key <..filepath..>

    • key used to submit data to uptime system (same key schema as is used for block production)
  • --uptime-url <..url..>

    • URL exposed by the uptime backend

There are more details needed here for me to complete the deployment. For instance:

  • What are the Docker images from the running nodes? Would devnet work? We are currently using gcr.io/o1labs-192920/mina-daemon:3.0.1-alpha1-release-3.0.1-0473756-bullseye-devnet
  • These --itn-** flags are not available in the mina daemon command. Am I missing something? @georgeee

SanabriaRusso avatar Aug 01 '24 08:08 SanabriaRusso

Hi @SanabriaRusso!

Good catch.

We need to investigate this, flags should be available for the devnet build. If they aren't, this is an issue.

georgeee avatar Aug 28 '24 10:08 georgeee

We specifically disabled ITN in compile config in the runup to Berkeley release to prevent accidental misusage.

I prepared a branch georgeee/compatible-with-itn with itn_feautures: true in devnet.mlh to rollback this change.

georgeee avatar Aug 29 '24 18:08 georgeee

https://gist.github.com/georgeee/25dfaf41948f3409ba35e23886e170f2 sql schema

georgeee avatar Nov 27 '24 17:11 georgeee

Major components were extracted into the following repository: https://github.com/o1-labs/mina-perf-testing The fetcher-infra-tmp remains until deployment migration to the standardised form (k8s).

shimkiv avatar Dec 16 '24 08:12 shimkiv