foundationdb icon indicating copy to clipboard operation
foundationdb copied to clipboard

Go: [BREAKING] explicit destruction for futures and transactions

Open gm42 opened this issue 8 months ago • 55 comments

  • Explicit destruction for futures and transactions.
  • Introduction of context for transactions and futures.
  • Simplification of Future interface and internal value caching

This is a breaking change

This change makes the Go binding stop relying on Go's GC finalizers and instead use explicit Close() calls to release resources, which is more Go-idiomatic; additionally, context.Context is added to function signatures and supported through transaction cancellation.

NOTE: once merged this will be a breaking change for users because memory leaks would start appear on their FoundationDB clients, unless Close() is called for futures, transactions and range iterators.

Code-Reviewer Section

  • [x] The PR has a description, explaining both the problem and the solution.
  • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
  • [x] Every function/class/actor that was touched is reasonably well documented.

I have not yet updated test code as I'd like to discuss this change first.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

gm42 avatar Mar 26 '25 10:03 gm42

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: 1e768c3f5f7e58c6b565ec5efd43e8ad9d93a34d
  • Duration 0:22:16
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 01 '25 11:04 foundationdb-ci

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 1e768c3f5f7e58c6b565ec5efd43e8ad9d93a34d
  • Duration 0:31:43
  • Result: :x: FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 01 '25 12:04 foundationdb-ci

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 1e768c3f5f7e58c6b565ec5efd43e8ad9d93a34d
  • Duration 0:35:50
  • Result: :x: FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 01 '25 12:04 foundationdb-ci

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 1e768c3f5f7e58c6b565ec5efd43e8ad9d93a34d
  • Duration 0:39:30
  • Result: :x: FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 01 '25 12:04 foundationdb-ci

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 1e768c3f5f7e58c6b565ec5efd43e8ad9d93a34d
  • Duration 0:40:00
  • Result: :x: FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci avatar Apr 01 '25 12:04 foundationdb-ci

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 1e768c3f5f7e58c6b565ec5efd43e8ad9d93a34d
  • Duration 0:45:09
  • Result: :x: FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 01 '25 12:04 foundationdb-ci

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 1e768c3f5f7e58c6b565ec5efd43e8ad9d93a34d
  • Duration 1:01:18
  • Result: :x: FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 01 '25 12:04 foundationdb-ci

@johscheuer I have fixed the build errors in the directory layer; please trigger CI again

gm42 avatar Apr 01 '25 16:04 gm42

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: b68c700490dd91863ca9359b3dd38fd1217b994c
  • Duration 0:22:28
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 02 '25 05:04 foundationdb-ci

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: b68c700490dd91863ca9359b3dd38fd1217b994c
  • Duration 0:40:40
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 02 '25 05:04 foundationdb-ci

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: b68c700490dd91863ca9359b3dd38fd1217b994c
  • Duration 0:48:30
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 02 '25 05:04 foundationdb-ci

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: b68c700490dd91863ca9359b3dd38fd1217b994c
  • Duration 0:55:18
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 02 '25 05:04 foundationdb-ci

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: b68c700490dd91863ca9359b3dd38fd1217b994c
  • Duration 1:00:12
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 02 '25 05:04 foundationdb-ci

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: b68c700490dd91863ca9359b3dd38fd1217b994c
  • Duration 1:01:10
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci avatar Apr 02 '25 05:04 foundationdb-ci

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: b68c700490dd91863ca9359b3dd38fd1217b994c
  • Duration 1:06:03
  • Result: :white_check_mark: SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci avatar Apr 02 '25 05:04 foundationdb-ci

Thanks for triggering the CI @jzhou77.

I guess next step is to discuss the changes

gm42 avatar Apr 04 '25 07:04 gm42

Thanks again for the PR, it's in my queue for next week!

Thanks! It's a big change, looking forward to the discussion!

gm42 avatar Apr 04 '25 16:04 gm42

For the futures I'm not sure if it's a good idea or if we should at least add a helper method to get the value and close the future directly otherwise users will write repeatedly the same code where they get the future, get the value from it and then close the future.

I see what you mean; something like GetAndClose()? I will add it; the downside is that users might do something like this (I consider it an antipattern):

		ret, txClose, err := rt.ReadTransact(context.Background(), func(rtr fdb.ReadTransaction) (interface{}, error) {
			values := make([]byte, 1000)
			for i := 0; i < 1000; i++ {
				key := fdb.Key(fmt.Sprintf("my-key-%d", i))
				values[i] = rtr.GetAndClose(key)
			}
			return values, nil
		})
		if err != nil {
			return nil, err
		}
		txClose()

instead of:

		futures := make([]fdb.FutureByteSlice, 1000)
		ret, txClose, err := rt.ReadTransact(context.Background(), func(rtr fdb.ReadTransaction) (interface{}, error) {
			for i := 0; i < 1000; i++ {
				key := fdb.Key(fmt.Sprintf("my-key-%d", i))
				futures[i] = rtr.Get(key)
			}
			return nil, nil
		})
		if err != nil {
			return nil, err
		}

		// ... do something else while values are being fetched ...

		values := make([]byte, 1000)
		for i:=0;i<1000;i++ {
			values[i], err = futures[i].Get()
			futures[i].Close()
			if err != nil {
				return nil, err
			}
		}
		txClose()

Simply because the latter requires more code/typing.

gm42 avatar Apr 30 '25 13:04 gm42

I also suggest that the Future APIs should be reworked so that MustGet or Get close the future.

These already have inbuilt routines to synchronize the access. (the previous version of this comment was incorrect; I missed sync.Once)

Semisol avatar May 07 '25 16:05 Semisol

@gm42 After looking around the code, I think the following would be best:

  • For parity, all futures' Get (except internal ones) are updated to cache return values.
  • Each future has a new sync.RWMutex guarding f.ptr. f.ptr is set to nil after closure. This is only relevant for internal use in:
    • IsReady: returns true if closed.
    • BlockUntilReady: if nil, skip blocking
    • Get: if cached, the read falls through. otherwise, if nil, returns a future cancelled error.
  • The Destroy and Cancel methods are merged into Cancel.
  • Get will automatically close the future after values are ready.
  • Developers are advised to do Cancel if they determine a future will not be used, or if they are not sure if it will (after any potential users go out of scope).

Semisol avatar May 07 '25 17:05 Semisol

I also suggest that the Future APIs should be reworked so that MustGet or Get close the future.

Yes, I reached the same conclusion last week.

I think the following would be best:

thanks for the feedback @Semisol!

For parity, all futures' Get (except internal ones) are updated to cache return values.

This seems doable, I will try to add it to this PR.

Each future has a new sync.RWMutex guarding f.ptr. f.ptr is set to nil after closure. This is only relevant for internal use

Not sure about this one; will try to find out whether it's the only option or not, as I change more code.

The Destroy and Cancel methods are merged into Cancel.

I think you mean Close and Cancel; however, if you look at the C API there are specific behaviors associated to them. Will eventually remove Destroy if it's not necessary anymore for the user to call it.

  • Get will automatically close the future after values are ready.

Yes, this is doable thanks to the callback. I will instrument it further.

  • Developers are advised to do Cancel if they determine a future will not be used, or if they are not sure if it will (after any potential users go out of scope).

I think the current pattern with Close() matches what we do in Go in other cases (for example, with HTTP response bodies) and I would stick to it. Rather than asking developers to conditionally use Cancel(), it's safer to tell them to always use Close() and we have the method do nothing when there's nothing to do.

Will submit some changes later as I rewrite part of the current ones and add context.Context also to the future get functions

gm42 avatar May 12 '25 09:05 gm42

Will submit some changes later as I rewrite part of the current ones and add context.Context also to the future get functions

I just submitted the first part of this rewrite:

  • [x] use a channel instead of sync.Mutex to wait for callback (will favour integration with context.Context)
  • [x] add context.Context to future Getter functions

What's left to be done:

  • [ ] align futures to internally-cached approach
  • [ ] release future when the callback re-enters
  • [ ] always destroy instead of using fdb_future_release_memory

gm42 avatar May 13 '25 10:05 gm42

I've completed my changes; a couple methods have been removed from the Future interface, I think this way it's easier to use it.

By the way, the C FDB API implementation does indeed treat destroy and cancel as almost the same:

extern "C" DLLEXPORT void fdb_future_cancel(FDBFuture* f) {
	CATCH_AND_DIE(TSAVB(f)->addref(); TSAVB(f)->cancel(););
}

extern "C" DLLEXPORT void fdb_future_destroy(FDBFuture* f) {
	CATCH_AND_DIE(TSAVB(f)->cancel(););
}

gm42 avatar May 13 '25 14:05 gm42

If we want to further simplify this:

  1. the Close() method can be removed
  2. when a future is created, the Get() is called in background (this means the constructor for the future will need a context)

Pros:

  • no need to call Close()

Cons:

  • context in future constructor (although should be already there for the transaction)
  • user loses control on when Go memory is allocated (large values will be fetched and stored even if ultimately unnecessary)

I do not like this direction because it would be wasteful by design; if we really want to pursue it, it would be best to do it only for the scalar types (not the KV or K arrays)

gm42 avatar May 13 '25 14:05 gm42

I have done some of my own changes here, with some opiniation: https://github.com/semisol/foundationdb

I'll look into the changes here and get back with feedback soon.

(I have removed contexts from Transact and ReadTransact as well in my fork. I think the current design of that is quite iffy, abstracting a normal transaction, and a transaction provider that does retries as one.)

Semisol avatar May 13 '25 17:05 Semisol

I think the current design of that is quite iffy, abstracting a normal transaction, and a transaction provider that does retries as one

I noticed that, but rather than an even bigger change I opted for adding context without pulling that apart. There is no performance/correctness concern other than it does not look elegant.

gm42 avatar May 14 '25 07:05 gm42

I have done some of my own changes here, with some opiniation: https://github.com/semisol/foundationdb

I'll look into the changes here and get back with feedback soon.

I checked your branch; aside from undoing cancellation and context I think it introduces transaction leaks (with no finalizer and no closer, the transaction will not be closed). I saw you wanted to counteract that by closing the transaction in the retryable, but not sure that you can retry anything if the transaction has already been closed.

Looking forward to your feedback!

gm42 avatar May 14 '25 07:05 gm42

Let me write what is (at least for me) an interesting problem regarding closing transactions: there are basically two use cases to support:

  1. opening a transaction, creating some futures, resolving them and getting their values before commit or close
  2. opening a transaction, creating some futures, committing the transaction, then resolving futures and getting their value, then closing the transaction.

In (2), closing the transaction causes all futures to be cancelled, thus it can only be done after user is done using the futures. I think (2) is a valid use-case and we should not stop supporting it because it allows to commit a transaction and transfer the resolved futures' values from C memory to Go memory only as needed, after the transaction commit. This allows to complete a commit very quickly and process the gathered results afterwards.

Note: the lifecycle is similar for both transactions that will be committed and transactions that will not be committed: all transactions must be destroyed, even those which are not committed, the only variation is the commit happening before destruction. When doing read-only operations, transactions are always destroyed but never committed.

gm42 avatar May 14 '25 07:05 gm42

I checked your branch; aside from undoing cancellation and context I think it introduces transaction leaks (with no finalizer and no closer, the transaction will not be closed). I saw you wanted to counteract that by closing the transaction in the retryable, but not sure that you can retry anything if the transaction has already been closed.

retryable returns only when it is not possible to retry anymore, so closure in that case is correct. Instead, there should be tr.Reset(), to restart the transaction. I will push a commit with this soon.

(2) opening a transaction, creating some futures, committing the transaction, then resolving futures and getting their value, then closing the transaction.

This should be avoided in most cases, as you cannot retry once you are outside of the transaction, except with application logic. At that point, the user should use the CreateTransaction API and integrate it into their application logic.

I am not against supporting such an API though, but I believe it should be distinct.

Semisol avatar May 15 '25 05:05 Semisol

retryable returns only when it is not possible to retry anymore, so closure in that case is correct.

It is not correct IMO because closing a transaction implicitly cancels all associated futures, which can be used after transaction commit.

This should be avoided in most cases, as you cannot retry once you are outside of the transaction, except with application logic. At that point, the user should use the CreateTransaction API and integrate it into their application logic.

I am not against supporting such an API though, but I believe it should be distinct.

Futures are perfectly valid to be used after transaction has been committed; this is currently supported when using the C binding and when using the Go binding. There is no need to retry because resolving a future doesn't involve any future-specific retry.

gm42 avatar May 15 '25 08:05 gm42