platform icon indicating copy to clipboard operation
platform copied to clipboard

WASM SDK: State transitions wait for full timeout despite successful execution

Open thephez opened this issue 4 months ago • 2 comments

Expected Behavior

When executing state transitions through the WASM SDK (via index.html or programmatically), the operation should complete and return immediately once the state transition is successfully included in a block on Dash Platform, typically within a few seconds.

Current Behavior

State transitions executed via broadcast_and_wait in the WASM SDK wait for the entire timeout period (~80 seconds) even when the state transition has already been successfully processed and included in a block much earlier. This creates a poor user experience where users see "Processing..." for the full timeout duration despite the operation completing successfully.

Affected Operations

All WASM SDK state transitions are affected, including identity, data contract, document, token, and voting operations.

Root Cause Analysis

The issue stems from the broadcast_and_wait implementation in the rs-sdk:

  1. File: packages/rs-sdk/src/platform/transition/broadcast.rs

    • broadcast_and_wait calls broadcast() then wait_for_response()
    • wait_for_response() uses retry logic with default settings
  2. Default Timeouts (packages/rs-dapi-client/src/request_settings.rs):

    • DEFAULT_TIMEOUT: Duration = Duration::from_secs(10)
    • DEFAULT_RETRIES: usize = 5
    • Total potential wait time: 10s × 5 retries = 50+ seconds
  3. WASM SDK Implementation: All state transition functions in packages/wasm-sdk/src/state_transitions/ use:

    let result = state_transition
        .broadcast_and_wait::<StateTransitionProofResult>(&sdk, None)
        .await
    
  4. Wait Logic: The wait_for_state_transition_result continues polling even after the state transition is successfully confirmed, potentially waiting for the full timeout period.

Steps to Reproduce

  1. Open /packages/wasm-sdk/index.html in a browser
  2. Select any state transition operation (e.g., "Data Contract Create")
  3. Fill in valid parameters and authentication details
  4. Click "Execute"
  5. Observe that the operation shows "Processing..." for 60-80 seconds
  6. The result eventually shows success, but the wait time is excessive

Possible Solutions

Option 1: Early Success Detection

Modify wait_for_state_transition_result to:

  • Return immediately upon receiving a successful proof response
  • Only continue polling if the response indicates the transition is still pending
  • Implement proper status checking in the response parsing

Option 2: Configurable Timeouts for WASM SDK

  • Expose timeout configuration in WASM SDK functions
  • Set more reasonable defaults for web usage (e.g., 15-30 seconds max)
  • Allow per-operation timeout customization

Option 3: Broadcast-Only Option

  • Add broadcast_only variants that don't wait for confirmation
  • Return transaction hash immediately for async tracking
  • Let users choose between fast broadcast vs. confirmed execution

Option 4: Polling Strategy Optimization

  • Implement exponential backoff in polling
  • Use streaming connections where available
  • Add circuit breaker pattern for failed nodes

Context

This issue significantly impacts WASM SDK usability, especially for the index.html demo interface and any web applications built on the WASM SDK. Users experience long, unnecessary delays that make the platform appear slow and unresponsive, even when operations complete successfully within seconds.

The problem is particularly noticeable when:

  • Testing state transitions via the web interface
  • Building responsive web applications
  • Conducting demos or presentations
  • Running automated tests that execute multiple state transitions

Your Environment

  • WASM SDK Version: Current main branch
  • Affected Files:
    • packages/wasm-sdk/src/state_transitions/identity/mod.rs
    • packages/wasm-sdk/src/state_transitions/contracts/mod.rs
    • packages/wasm-sdk/src/state_transitions/documents/mod.rs
    • packages/wasm-sdk/src/state_transitions/tokens/mod.rs
    • packages/wasm-sdk/index.html
    • packages/rs-sdk/src/platform/transition/broadcast.rs
    • packages/rs-dapi-client/src/request_settings.rs
  • Environment: Web browsers (Chrome, Firefox, Safari)
  • Network: Both testnet and mainnet affected
  • Usage Pattern: Interactive web interface and programmatic API calls

thephez avatar Aug 25 '25 17:08 thephez

This issue was created via Claude Code based on analysis of the WASM SDK codebase and user experience feedback.

thephez avatar Aug 25 '25 17:08 thephez

Additional Root Cause Analysis (via Claude Code)

I investigated this issue further and found the actual root cause is in the DAPI layer, not just rs-sdk timeout settings.

Key Discovery: DAPI's 80-second Timeout

Location: line 46:

WAIT_FOR_ST_RESULT_TIMEOUT=80000

Implementation:

  • /packages/dapi/lib/grpcServer/handlers/platform/platformHandlersFactory.js (line 114)
  • /packages/dapi/lib/externalApis/tenderdash/waitForTransactionToBeProvable/waitForTransactionToBeProvableFactory.js (lines 63-72)

The Real Problem

The DAPI layer's waitForTransactionToBeProvable function implements a Promise.race between:

  1. getExistingTransactionResult - RPC call to check if transaction already exists
  2. waitForTransactionResult - Event listener waiting for new transaction events
  3. 80-second timeout - Hard timeout that waits the full duration

Race Condition Explanation

  • Fast returns (5-10 seconds): Happen when the RPC call finds the transaction already exists when the request arrives
  • Slow returns (80+ seconds): Occur when the event listener is set up but the transaction was already processed, so it waits for the full timeout
  • This explains the inconsistent behavior - it's timing-dependent on when the SDK makes the request relative to blockchain processing

Code Analysis

In waitForTransactionToBeProvableFactory.js:

// Lines 35-55: Promise.race between existing results and new events
const transactionResultPromise = Promise.race([
  // Try to fetch existing tx result
  existingTransactionResultPromise.then((result) => {
    detachTransactionResult();
    return result; // IMMEDIATE RETURN if found
  }),
  // Wait for upcoming results if not found
  waitForTransactionResultPromise,
]);

// Lines 58-73: Another Promise.race with timeout
return Promise.race([
  transactionResultPromise,
  new Promise((resolve, reject) => {
    timeoutId = setTimeout(() => {
      reject(new TransactionWaitPeriodExceededError(hashString));
    }, timeout); // 80-second timeout
  }),
]);

Corrected Solution

The fix should be in DAPI layer:

  1. Reduce default timeout: Change WAIT_FOR_ST_RESULT_TIMEOUT from 80000ms to 15000ms
  2. Add polling mechanism: Instead of just waiting for events, poll getExistingTransactionResult every 1-2 seconds
  3. Fix race condition: Ensure event listener is set up before checking existing results
  4. Make configurable: Allow clients to specify timeout values

Impact

This explains why:

  • ALL state transitions are affected (not just specific types)
  • Behavior is inconsistent and timing-dependent
  • Transactions appear immediately in block explorers but SDK waits 80 seconds
  • The issue occurs in both wasm-sdk and rs-sdk (both use same DAPI endpoint)

Investigation conducted via Claude Code

thephez avatar Sep 04 '25 17:09 thephez