foundationdb-rs icon indicating copy to clipboard operation
foundationdb-rs copied to clipboard

Directory layer

Open PierreZ opened this issue 4 years ago • 19 comments

This is the PR draft to resolves #27 and based on the initial work done by @garrensmith available on #186.

I wrote some basic tests, but here's a TODO list:

  • [x] Add more tests on the Node part,
  • [x] make clippy happy,
  • [x] Add layer verification on the last node before opening,
  • [x] Implement list,
  • [x] Implements move,
  • [x] Implements delete,
  • [x] Add directory to BindingTester
  • [x] ~~Add validation that the binding is compatible with others~~
  • [x] Implement directory partitions
  • [x] check ranges
  • [x] Passing all seeds and steps
  • [x] cleanup code + doc

My rust is feeling a bit rusty, so feel free to comment :smile:

EDIT: I discovered bindingTester while searching for directory validation.

PierreZ avatar Dec 31 '20 12:12 PierreZ

Coverage Status

Coverage increased (+5.4%) to 72.768% when pulling a92fd9c9c92d57970f7a824fd66962f53f23573d on PierreZ:dev/directory into f41fa13eb57aa883944476e29ee81a6708e67b4d on Clikengo:master.

coveralls avatar Dec 31 '20 12:12 coveralls

Codecov Report

Merging #217 (a92fd9c) into master (f41fa13) will increase coverage by 2.95%. The diff coverage is 77.46%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #217      +/-   ##
==========================================
+ Coverage   76.58%   79.53%   +2.95%     
==========================================
  Files          18       24       +6     
  Lines        3041     4672    +1631     
==========================================
+ Hits         2329     3716    +1387     
- Misses        712      956     +244     
Impacted Files Coverage Δ
foundationdb/src/directory/error.rs 0.00% <0.00%> (ø)
foundationdb/src/lib.rs 100.00% <ø> (ø)
foundationdb/src/tuple/hca.rs 66.11% <20.00%> (+35.88%) :arrow_up:
foundationdb/src/directory/directory_partition.rs 38.46% <38.46%> (ø)
foundationdb/src/directory/mod.rs 76.71% <76.71%> (ø)
foundationdb-bindingtester/src/main.rs 85.52% <77.85%> (-3.18%) :arrow_down:
foundationdb/src/directory/node.rs 81.35% <81.35%> (ø)
foundationdb/src/directory/directory_layer.rs 86.11% <86.11%> (ø)
foundationdb/src/directory/directory_subspace.rs 86.71% <86.71%> (ø)
foundationdb/src/api.rs 87.67% <100.00%> (+4.10%) :arrow_up:
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f41fa13...a92fd9c. Read the comment docs.

codecov[bot] avatar Dec 31 '20 12:12 codecov[bot]

Thanks for the early review @Speedy37 :+1:

PierreZ avatar Jan 07 '21 07:01 PierreZ

I'm having difficulties to grasp the idea of the DirectoryPartition, do you have knowledge of it, or should I ask on the forum?

PierreZ avatar Jan 14 '21 21:01 PierreZ

Turns out, my first implementation and mental representation of the Directory was wrong :stuck_out_tongue:

Previous commits were not compatible with other bindings, so everything has been manually checked with the Go binding. Before reimplement delete and move, I will include the binding-checker to validate that everything is working properly.

PierreZ avatar Feb 03 '21 14:02 PierreZ

Great work :)

For the bindingtester, you can add the test to https://github.com/Clikengo/foundationdb-rs/blob/master/scripts/run_bindingtester.sh so the CI runs it.

test-name is directory I think

Speedy37 avatar Feb 03 '21 15:02 Speedy37

Just wanted to share that I'm finally passing a seed :partying_face:

Test with seed 1330929912 and concurrency 1 had 0 incorrect result(s) and 0 error(s) at API version 610
Completed directory test with random seed 1330929912 and 10 operations

TODO list updated

PierreZ avatar Mar 30 '21 07:03 PierreZ

Wow awesome work.


From: Pierre Zemb @.> Sent: Tuesday, March 30, 2021 9:58:50 AM To: Clikengo/foundationdb-rs @.> Cc: garren smith @.>; Mention @.> Subject: Re: [Clikengo/foundationdb-rs] [WIP] directory layer (#217)

Just wanted to share that I'm finally passing a seed 🥳

Test with seed 1330929912 and concurrency 1 had 0 incorrect result(s) and 0 error(s) at API version 610

Completed directory test with random seed 1330929912 and 10 operations

TODO list updated

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Clikengo/foundationdb-rs/pull/217#issuecomment-810004482, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AABL2ASLWKSI5SNPJPZ2PL3TGGADVANCNFSM4VPL3QPA.

garrensmith avatar Mar 30 '21 08:03 garrensmith

Hey :wave:

Last week, as I was digging into some issues on my implementation, I had the idea to go through Flow´s implementation(to be honest, I was only looking at the Go/Java´s code :tongue:) After discovering that Rust async is closer to Flow than I thought, I decided to erase everything and just backport the Flow´s implementation, which will be more maintainable in my opinion :smile:

However, I stumble across two issues related to async and FdbValuesIter which is not Send. The related code can be found here:

error: future cannot be sent between threads safely
   --> foundationdb/src/directory/directory_layer.rs:665:5
    |
665 |     #[async_recursion]
    |     ^^^^^^^^^^^^^^^^^^ future created by async block is not `Send`
    |
    = help: within `FdbValuesIter`, the trait `std::marker::Send` is not implemented for `Rc<FdbFutureHandle>`
note: future is not `Send` as this value is used across an await
   --> foundationdb/src/directory/directory_layer.rs:688:17
    |
679 |             let range: Arc<FairMutex<FdbValuesIter>> = Arc::new(FairMutex::new(range.into_iter()));
    |                 ----- has type `Arc<parking_lot::lock_api::Mutex<RawFairMutex, FdbValuesIter>>` which is not `Send`
...
688 |                 self.remove_recursive(trx, sub_node).await?;
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ await occurs here, with `range` maybe used later
...
694 |         }
    |         - `range` is later dropped here
    = note: required for the cast to the object type `dyn futures::Future<Output = std::result::Result<(), DirectoryError>> + std::marker::Send`
    = note: this error originates in an attribute macro (in Nightly builds, run with -Z macro-backtrace for more info)
error: future cannot be sent between threads safely
   --> foundationdb/src/directory/directory_layer.rs:665:5
    |
665 |     #[async_recursion]
    |     ^^^^^^^^^^^^^^^^^^ future created by async block is not `Send`
    |
    = help: within `FdbValuesIter`, the trait `std::marker::Send` is not implemented for `*const keyvalue`
note: future is not `Send` as this value is used across an await
   --> foundationdb/src/directory/directory_layer.rs:688:17
    |
679 |             let range: Arc<FairMutex<FdbValuesIter>> = Arc::new(FairMutex::new(range.into_iter()));
    |                 ----- has type `Arc<parking_lot::lock_api::Mutex<RawFairMutex, FdbValuesIter>>` which is not `Send`
...
688 |                 self.remove_recursive(trx, sub_node).await?;
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ await occurs here, with `range` maybe used later
...
694 |         }
    |         - `range` is later dropped here
    = note: required for the cast to the object type `dyn futures::Future<Output = std::result::Result<(), DirectoryError>> + std::marker::Send`
    = note: this error originates in an attribute macro (in Nightly builds, run with -Z macro-backtrace for more info)

FdbValuesIter is defined like this:

/// An iterator of keyvalues owned by a foundationDB future
pub struct FdbValuesIter {
    f: Rc<FdbFutureHandle>,
    keyvalues: *const fdb_sys::FDBKeyValue,
    len: i32,
    pos: i32,
}

and the first two fields are the one not Send able. Could you help me fix it? I tried the Arc<Mutex<>> technique, without luck. Thanks in advance :smiley:

PierreZ avatar Apr 13 '21 19:04 PierreZ

Hi,

Nice work 👏

About the Send issue, I see 2 ways to try to fix it :

  1. Very simple: maybe FdbValuesIter and FdbValue should contains an Arc<FdbFutureHandle>. Then you are allowed to add unsafe impl Send for FdbValuesIter {} and unsafe impl Send for FdbValue {}. The lifetime of *const fdb_sys::FDBKeyValue is the same as the handle.
  2. Return a non Send future: don't use #[async_recursive attribute] and write the workaround yourself (wrapper method that box the future to have a finite stack size).

I would personally go for 1 I think.

Speedy37 avatar Apr 13 '21 20:04 Speedy37

Hi,

Nice work clap

About the Send issue, I see 2 ways to try to fix it :

1. Very simple: maybe `FdbValuesIter` and `FdbValue` should contains an `Arc<FdbFutureHandle>`. Then you are allowed to add `unsafe impl Send for FdbValuesIter {}` and `unsafe impl Send for FdbValue {}`. The lifetime of `*const fdb_sys::FDBKeyValue` is the same as the handle.

2. Return a non Send future: don't use `#[async_recursive attribute]` and write the workaround yourself (wrapper method that box the future to have a finite stack size).

I would personally go for 1 I think.

Thanks a lot @Speedy37, option 1 worked perfectly :rocket:

PierreZ avatar Apr 14 '21 08:04 PierreZ

I finished my first iteration of backporting Flow´s code, now I´m working on making tests passes :smile:

PierreZ avatar Apr 16 '21 14:04 PierreZ

Most of my tests are passing on most seeds, except when I'm using the DirectoryPartitions. I'm struggling to find the problem here for now, keep digging

PierreZ avatar May 08 '21 13:05 PierreZ

Still digging :disappointed: I am experiencing quite some difficulties fixing the last bugs, as fixing one seed is creating new bugs in seeds that used to pass :tongue:

Most of my mistakes seems to be related to:

  • layer (sometimes I cannot find the layer under the right key). I tried removing every clear from my implementation, without luck
  • DirectoryPartition (exists and list are failing most of the time), due to path's manipulation between the directory Partition/Subspace/Layer. I'm backporting this class from Flow/C++ but I must admit my C++ is even more rusty than my rust :smiley: I'm not sure if the DirectoryPartition is used a lot, but we cannot disable it from the bindingtester.

I could use some help :innocent: If someone is willing to help me, I've marked some faulty seeds here.

PierreZ avatar Jun 06 '21 17:06 PierreZ

Unblocked, thanks to @Geal :tada:

Got a dedicated computer searching for faulty seed with 10k ops. I will fix them and cleanup the code :smile:

PierreZ avatar Jun 21 '21 07:06 PierreZ

The bindingtester have been running for 24h+ with 10k ops without finding a faulty seed :smile:

I squashed the commits, ready to be reviewed :tada:

As mentionned before, I tried to be as close as possible from the Flow code, but it can be changed.

PierreZ avatar Jun 24 '21 11:06 PierreZ

I was looking for this functionality today. I see that was a large amount of work. Has been in review for nearly a year. What would be next step for the merge?

pierrebelzile avatar Jun 06 '22 19:06 pierrebelzile

I was looking for this functionality today. I see that was a large amount of work. Has been in review for nearly a year. What would be next step for the merge?

Hi @pierrebelzile :wave:

Please note that the development of the crate has moved to the dedicated Github org. The fork is including the directory feature, and several others. Support for FDB 7.1 is also on its way.

More info can be found here on why we had to hard fork can be found here.

PierreZ avatar Jun 07 '22 06:06 PierreZ

Thanks. Can't believe I missed that. Great news and thanks for the work! :)

pierrebelzile avatar Jun 07 '22 14:06 pierrebelzile