celestia-node icon indicating copy to clipboard operation
celestia-node copied to clipboard

feat(share/shrex): add context cancellation to shrexeds and shrexnd clients

Open TheRealSibasishBehera opened this issue 6 months ago • 6 comments

This PR implements context cancellation support for shrex clients to fix hanging requests when parent context is canceled.

Closes #3480

Changes

  • shrexeds client: Added readEDSWithContext() method with goroutine+channel pattern for context cancellation
  • shrexnd client: Added readNamespaceDataWithContext() method with similar cancellation support
  • share/eds: Updated ReadShares() to accept context and check for cancellation in reading loop
  • store/file: Added context propagation through ODS and square reading functions
  • tests: Added tests for quick server response cancellation and data reading cancellation cases

TheRealSibasishBehera avatar Jun 13 '25 02:06 TheRealSibasishBehera

Just a heads-up: we’re in the middle of a major refactor of the Shrex client/server (see #4249). Once that lands, most of the current code will be replaced, so we wouldn’t be able to merge this change as-is.

The underlying issue is still relevant and we’d love to revisit it after the refactor. Sorry we didn’t flag this earlier in the issue, and thank you again for your contribution!

walldiss avatar Jun 16 '25 15:06 walldiss

As for solution, if you will be still interested in contributing after the refactoring, please take a look into setting up appropriate stream timeout, that should match parent context timeout or pre-configure value. There should be no need for spawning go-routines and creating channels.

walldiss avatar Jun 16 '25 15:06 walldiss

Thanks for the heads up, @walldiss!

As for solution, if you will be still interested in contributing after the refactoring, please take a look into setting up appropriate stream timeout, that should match parent context timeout or pre-configure value. There should be no need for spawning go-routines and creating channels.

Your suggestion makes sense . I had overlooked how deadlines are applied for streams from contexts. I will be happy to implement the solution using stream timeout once the refactor PR is merged.

Please feel free to close or mark this PR for now.

TheRealSibasishBehera avatar Jun 17 '25 11:06 TheRealSibasishBehera

Streams already have a deadline set from the context! What's missing is handling the cancellation from manual cancel, and there is no way to do it without a goroutine. If we care about microsecond overhead caused by the routine, the original issue should be just closed.

Wondertan avatar Jun 17 '25 11:06 Wondertan

I can see that the read deadline is reset to a server parameter value after write gets completed , due to this the parent context wont be respected

Is it perhaps this part that the issue originally aims to solve rather than the manualcancel call ?

share/shwap/p2p/shrex/shrexeds/client.go

	// read and parse status from peer
	resp := new(shrexpb.Response)
	err = stream.SetReadDeadline(time.Now().Add(c.params.ServerReadTimeout))
	if err != nil {
		log.Debugw("client: failed to set read deadline for reading status", "err", err)
	}
	_, err = serde.Read(stream, resp)

TheRealSibasishBehera avatar Jun 17 '25 11:06 TheRealSibasishBehera

There is c.setStreamDeadlines(ctx, stream) call that sets deadlines on the stream, respecting the context

Wondertan avatar Jun 17 '25 12:06 Wondertan

Hey @TheRealSibasishBehera. We have unified shrexnd and shrexeds components and replaced them with shrex client/server parts that implement generic logic for handling all types of shwap requests. Do you still want to continue working on this issue and prefer rebase on main(fixing all conflicts) or open a new PR?

vgonkivs avatar Oct 06 '25 11:10 vgonkivs

Thanks for the update @vgonkivs I will start working on a fresh PR from the main branch, as it will be easier given the refactor

TheRealSibasishBehera avatar Oct 07 '25 10:10 TheRealSibasishBehera