tryorama icon indicating copy to clipboard operation
tryorama copied to clipboard

Flaky dhtSync

Open mattyg opened this issue 9 months ago • 3 comments

On the scaffolding repo CI, we have some demonstrations of flakiness of dhtSync (or possibly an issue in holochain).

It arises when:

  1. alice creates a link
  2. we await dhtSync
  3. bob gets all links, and receives 1
  4. alice deletes the link
  5. we await dhtSync
  6. bob gets all links, and still receives 1

Here is an example ci run: https://github.com/holochain/scaffolding/actions/runs/13952691506/job/39058133036

mattyg avatar Mar 20 '25 19:03 mattyg

Currently tracking this on scaffolding here https://github.com/holochain/scaffolding/issues/472

c12i avatar Apr 09 '25 20:04 c12i

I've scaffolded an example app and edited the test to this

test("send hello and retrieve hellos", async () => {
  await runScenario(async (scenario) => {
    // Construct proper paths for your app.
    // This assumes app bundle created by the `hc app pack` command.
    const testAppPath = process.cwd() + "/../workdir/hello-world.happ";

    // Set up the app to be installed
    const appSource = { appBundleSource: { path: testAppPath } };

    // Add 2 players with the test app to the Scenario. The returned players
    // can be destructured.
    const [alice, beto] = await scenario.addPlayersWithApps([
      appSource,
      appSource,
    ]);

    // alice creates a link
    const aliceCell = alice.cells[0];
    const resultAlice = await aliceCell.callZome({
      zome_name: "hello_world",
      fn_name: "hello_world",
      payload: "hello world!",
    });
    assert.ok(resultAlice);

    // we await dhtSync
    await dhtSync([alice, beto], aliceCell.cell_id[0]);

    interface HelloOutput {
      message: String;
      author: AgentPubKey;
    }

    // bob gets all links, and receives 1
    const betoCell = beto.cells[0];
    const resultBeto: HelloOutput[] = await betoCell.callZome({
      zome_name: "hello_world",
      fn_name: "get_hellos",
    });
    assert.ok(resultBeto);
    assert.equal(resultBeto.length, 1);

    // alice gets all links, and receives 1
    const allHellosAlice: HelloOutput[] = await betoCell.callZome({
      zome_name: "hello_world",
      fn_name: "get_hellos",
    });

    // alice deletes the link
    const resultAliceDeletesLink = await aliceCell.callZome({
      zome_name: "hello_world",
      fn_name: "delete_hello_link",
      payload: resultAlice,
    });
    console.log("result alice deletes link", resultAliceDeletesLink);

    // we await dhtSync
    await dhtSync([alice, beto], aliceCell.cell_id[0]);

    // bob gets all links, and still receives 1
    const resultBetoAllLinks: HelloOutput[] = await betoCell.callZome({
      zome_name: "hello_world",
      fn_name: "get_hellos",
    });
    assert.ok(resultBetoAllLinks);
    assert.equal(resultBetoAllLinks.length, 0);
  });
});

with the modified zome code

#[hdk_extern]
pub fn hello_world(message: String) -> ExternResult<ActionHash> {
    // commit the Hello message
    let action_hash = create_entry(&EntryTypes::Hello(Hello { message }))?;

    // link it to an anchor for later retrieval
    let path = Path::from("hellos");
    // return the create link action hash
    create_link(
        path.path_entry_hash()?,
        action_hash.clone(),
        LinkTypes::AllHellos,
        (),
    )
}

#[hdk_extern]
pub fn delete_hello_link(link: ActionHash) -> ExternResult<ActionHash> {
    delete_link(link)
}

The test passes consistently 100 times in a row.

I've also written a reproduction in Holochain on the main-0.4 branch:

#[tokio::test(flavor = "multi_thread")]
async fn delete_link_deletes_link() {
    holochain_trace::test_run();

    let mut conductors = SweetConductorBatch::from_standard_config_rendezvous(2).await;

    let dna_file = SweetDnaFile::unique_from_test_wasms(vec![TestWasm::Link])
        .await
        .0;

    let apps = conductors
        .setup_app("app", &[dna_file.clone()])
        .await
        .unwrap();

    let ((alice,), (bob,)) = apps.into_tuples();

    let alice_pk = alice.cell_id().agent_pubkey().clone();
    let bob_pk = bob.cell_id().agent_pubkey().clone();

    println!("@!@!@ alice_pk: {alice_pk:?}");
    println!("@!@!@ bob: {bob_pk:?}");

    let create_link_hash: ActionHash = conductors[0]
        .call(
            &alice.zome(TestWasm::Link.coordinator_zome_name()),
            "create_link",
            (),
        )
        .await;

    await_consistency(20, &[alice.clone(), bob.clone()])
        .await
        .unwrap();

    let all_links: Vec<holochain_zome_types::link::Link> = conductors[1]
        .call(
            &bob.zome(TestWasm::Link.coordinator_zome_name()),
            "get_links",
            (),
        )
        .await;

    assert_eq!(all_links.len(), 1);

    let delete_link_action_hash: ActionHash = conductors[0]
        .call(
            &alice.zome(TestWasm::Link.coordinator_zome_name()),
            "delete_link",
            create_link_hash.clone(),
        )
        .await;

    await_consistency(10, &[alice.clone(), bob.clone()])
        .await
        .unwrap();

    let all_links: Vec<holochain_zome_types::link::Link> = conductors[1]
        .call(
            &bob.zome(TestWasm::Link.coordinator_zome_name()),
            "get_links",
            (),
        )
        .await;
    assert_eq!(all_links.len(), 0);
}

Equally passes without error 100 times in a row.

Please take a look at the code and let me know if this used to be the issue. If you have a reproduction of the problem, please paste the code or a link.

jost-s avatar Apr 14 '25 21:04 jost-s

Interesting. Looks like we're still seeing some flakiness in scaffolding CI: https://github.com/holochain/scaffolding/actions/runs/14478984906/job/40662080464

I'll see if I can figure out a reproduction.

mattyg avatar Apr 17 '25 15:04 mattyg