firefly icon indicating copy to clipboard operation
firefly copied to clipboard

Hardening: Run an Ethereum long run test at baseline throughput for a week

Open peterbroadhurst opened this issue 2 years ago • 4 comments

Child of #316

Goal: Measure baseline throughput of common FireFly operations on top of Ethereum, at a steady load across multiple days. Flag anything that fails to deliver or takes unexpectedly long to confirm.

peterbroadhurst avatar Mar 02 '22 19:03 peterbroadhurst

Discord thread for ongoing run status: https://discord.com/channels/905194001349627914/954223347410034729 Hyperledger Discord ➡️ #firefly ➡️ Ethereum long-run test thread

awrichar avatar May 11 '22 15:05 awrichar

Discord has the raw status of every ongoing run (many runs to date now), but I've captured the most recent one below. We'll capture future runs here at some regular interval as well.

Run Report

Started: 4/27/22 Duration: ~78 hours Git commit: ea38e01833f982b369e7033371e94d63a7331b92

Node Configuration 2 FireFly nodes on one virtual server (EC2 m4.xlarge) Entire FireFly stack is local to the server (ie both blockchains, Postgres databases, etc) Single geth node with 2 instances of ethconnect Maximum time to confirm before considering failure = 1 minute

Configuration details

core-config.yml

log:
        level: debug
broadcast:
        batch:
                size: 200
                timeout: 1s
privatemessaging:
        batch:
                size: 200
                timeout: 1s
message:
        writer:
                count: 5
download:
        worker:
                count: 100
publicstorage:
                ipfs:
                        api:
                                requestTimeout: 2s
                        gateway:
                                requestTimeout: 2s

ethconnect.yml

rest:
  rest-gateway:
    maxTXWaitTime: 120
    maxInFlight: 200
    alwaysManageNonce: true
    attemptGapFill: true
    sendConcurrency: 3
    gasEstimationFactor: 2.0

instances.yml

stackJSONPath: /home/ubuntu/.firefly/stacks/v10rc7/stack.json

wsConfig:
  wsPath: /ws
  readBufferSize: 16000
  writeBufferSize: 16000
  initialDelay: 250ms
  maximumDelay: 30s
  initialConnectAttempts: 5
  heartbeatInterval: 5s

instances:
  - name: long-run
    tests: [{"name": "msg_broadcast", "workers":50},{"name": "msg_private", "workers":50},{"name": "blob_broadcast", "workers":30},{"name": "blob_private", "workers":30},{"name": "custom_ethereum_contract", "workers":20},{"name": "token_mint", "workers":10}]
    length: 500h
    sender: 0
    recipient: 1
    messageOptions:
      longMessage: false
    tokenOptions:
      tokenType: fungible
    contractOptions: {"address": "0x4371a9ec8a430488ad853272fa645573724299ea"}

Results

  • Broadcast messages: 2,627,495
  • Private messages: 3,438,331
  • Token mints: 267,995
  • Contract invocations: 1,535,640
  • No errors

Run terminated when confirmations began to breach the 1 min threshold. Insufficient logs to diagnose the performance bottleneck - fix with https://github.com/hyperledger/firefly-perf-cli/pull/35 and run again.

Run Graph perf

awrichar avatar May 11 '22 16:05 awrichar

New Discord thread for runs in preparation for v1.1 (thread "FFPerf 1.1"): https://discord.com/channels/905194001349627914/1004496575059476570

awrichar avatar Aug 16 '22 13:08 awrichar

Run Report

Started: 8/17/22 Duration: ~92 hours Git commit: d54355dc8d062930286a419140dceafb82d42903

Node Configuration 2 FireFly nodes on one virtual server (EC2 m4.xlarge) Entire FireFly stack is local to the server (ie both blockchains, Postgres databases, etc) Single geth node with 2 instances of evmconnect Maximum time to confirm before considering failure = 1 minute

Configuration details

core-config.yml

log:
        level: debug
broadcast:
        batch:
                size: 200
                timeout: 1s
privatemessaging:
        batch:
                size: 200
                timeout: 1s
message:
        writer:
                count: 5
download:
        worker:
                count: 100
publicstorage:
                ipfs:
                        api:
                                requestTimeout: 2s
                        gateway:
                                requestTimeout: 2s

ethconnect.yml

rest:
  rest-gateway:
    maxTXWaitTime: 120
    maxInFlight: 200
    alwaysManageNonce: true
    attemptGapFill: true
    sendConcurrency: 3
    gasEstimationFactor: 2.0

instances.yml

stackJSONPath: /home/ubuntu/.firefly/stacks/v11rc1/stack.json

wsConfig:
  wsPath: /ws
  readBufferSize: 16000
  writeBufferSize: 16000
  initialDelay: 250ms
  maximumDelay: 30s
  initialConnectAttempts: 5
  heartbeatInterval: 5s

instances:
  - name: long-run
    tests: [{"name": "msg_broadcast", "workers":50},{"name": "msg_private", "workers":50},{"name": "blob_broadcast", "workers":30},{"name": "blob_private", "workers":30},{"name": "custom_ethereum_contract", "workers":20},{"name": "token_mint", "workers":10}]
    length: 500h
    sender: 0
    recipient: 1
    messageOptions:
      longMessage: false
    tokenOptions:
      tokenType: fungible
    contractOptions: {"address": "0xc39c9b8384d682a17b74d9454af3eeefff919434"}

Results

  • Broadcast messages: 3,899,387
  • Private messages: 5,077,650
  • Token mints: 381,309
  • Contract invocations: 2,090,955
  • No errors

Run terminated when disk space ran out and geth node terminated.

Run Graph perf1 perf2

awrichar avatar Aug 22 '22 15:08 awrichar