Flaky dhtSync
On the scaffolding repo CI, we have some demonstrations of flakiness of dhtSync (or possibly an issue in holochain).
It arises when:
- alice creates a link
- we
await dhtSync - bob gets all links, and receives 1
- alice deletes the link
- we
await dhtSync - bob gets all links, and still receives 1
Here is an example ci run: https://github.com/holochain/scaffolding/actions/runs/13952691506/job/39058133036
Currently tracking this on scaffolding here https://github.com/holochain/scaffolding/issues/472
I've scaffolded an example app and edited the test to this
test("send hello and retrieve hellos", async () => {
await runScenario(async (scenario) => {
// Construct proper paths for your app.
// This assumes app bundle created by the `hc app pack` command.
const testAppPath = process.cwd() + "/../workdir/hello-world.happ";
// Set up the app to be installed
const appSource = { appBundleSource: { path: testAppPath } };
// Add 2 players with the test app to the Scenario. The returned players
// can be destructured.
const [alice, beto] = await scenario.addPlayersWithApps([
appSource,
appSource,
]);
// alice creates a link
const aliceCell = alice.cells[0];
const resultAlice = await aliceCell.callZome({
zome_name: "hello_world",
fn_name: "hello_world",
payload: "hello world!",
});
assert.ok(resultAlice);
// we await dhtSync
await dhtSync([alice, beto], aliceCell.cell_id[0]);
interface HelloOutput {
message: String;
author: AgentPubKey;
}
// bob gets all links, and receives 1
const betoCell = beto.cells[0];
const resultBeto: HelloOutput[] = await betoCell.callZome({
zome_name: "hello_world",
fn_name: "get_hellos",
});
assert.ok(resultBeto);
assert.equal(resultBeto.length, 1);
// alice gets all links, and receives 1
const allHellosAlice: HelloOutput[] = await betoCell.callZome({
zome_name: "hello_world",
fn_name: "get_hellos",
});
// alice deletes the link
const resultAliceDeletesLink = await aliceCell.callZome({
zome_name: "hello_world",
fn_name: "delete_hello_link",
payload: resultAlice,
});
console.log("result alice deletes link", resultAliceDeletesLink);
// we await dhtSync
await dhtSync([alice, beto], aliceCell.cell_id[0]);
// bob gets all links, and still receives 1
const resultBetoAllLinks: HelloOutput[] = await betoCell.callZome({
zome_name: "hello_world",
fn_name: "get_hellos",
});
assert.ok(resultBetoAllLinks);
assert.equal(resultBetoAllLinks.length, 0);
});
});
with the modified zome code
#[hdk_extern]
pub fn hello_world(message: String) -> ExternResult<ActionHash> {
// commit the Hello message
let action_hash = create_entry(&EntryTypes::Hello(Hello { message }))?;
// link it to an anchor for later retrieval
let path = Path::from("hellos");
// return the create link action hash
create_link(
path.path_entry_hash()?,
action_hash.clone(),
LinkTypes::AllHellos,
(),
)
}
#[hdk_extern]
pub fn delete_hello_link(link: ActionHash) -> ExternResult<ActionHash> {
delete_link(link)
}
The test passes consistently 100 times in a row.
I've also written a reproduction in Holochain on the main-0.4 branch:
#[tokio::test(flavor = "multi_thread")]
async fn delete_link_deletes_link() {
holochain_trace::test_run();
let mut conductors = SweetConductorBatch::from_standard_config_rendezvous(2).await;
let dna_file = SweetDnaFile::unique_from_test_wasms(vec![TestWasm::Link])
.await
.0;
let apps = conductors
.setup_app("app", &[dna_file.clone()])
.await
.unwrap();
let ((alice,), (bob,)) = apps.into_tuples();
let alice_pk = alice.cell_id().agent_pubkey().clone();
let bob_pk = bob.cell_id().agent_pubkey().clone();
println!("@!@!@ alice_pk: {alice_pk:?}");
println!("@!@!@ bob: {bob_pk:?}");
let create_link_hash: ActionHash = conductors[0]
.call(
&alice.zome(TestWasm::Link.coordinator_zome_name()),
"create_link",
(),
)
.await;
await_consistency(20, &[alice.clone(), bob.clone()])
.await
.unwrap();
let all_links: Vec<holochain_zome_types::link::Link> = conductors[1]
.call(
&bob.zome(TestWasm::Link.coordinator_zome_name()),
"get_links",
(),
)
.await;
assert_eq!(all_links.len(), 1);
let delete_link_action_hash: ActionHash = conductors[0]
.call(
&alice.zome(TestWasm::Link.coordinator_zome_name()),
"delete_link",
create_link_hash.clone(),
)
.await;
await_consistency(10, &[alice.clone(), bob.clone()])
.await
.unwrap();
let all_links: Vec<holochain_zome_types::link::Link> = conductors[1]
.call(
&bob.zome(TestWasm::Link.coordinator_zome_name()),
"get_links",
(),
)
.await;
assert_eq!(all_links.len(), 0);
}
Equally passes without error 100 times in a row.
Please take a look at the code and let me know if this used to be the issue. If you have a reproduction of the problem, please paste the code or a link.
Interesting. Looks like we're still seeing some flakiness in scaffolding CI: https://github.com/holochain/scaffolding/actions/runs/14478984906/job/40662080464
I'll see if I can figure out a reproduction.