kvrocks icon indicating copy to clipboard operation
kvrocks copied to clipboard

Support DISK Command

Open tanruixiang opened this issue 3 years ago • 12 comments

It closes #874

tanruixiang avatar Sep 17 '22 11:09 tanruixiang

@PragmaTwice Hi. Do you have any more suggestions on reducing redundant code?

tanruixiang avatar Sep 17 '22 15:09 tanruixiang

@PragmaTwice @git-hulk @torwig I noticed a problem. In the following tests. When we write a value with only one bit(like 'a'). Occasionally, the size of the key we get is 0. It occurs frequently when the value is very small. I guess the reason for this problem is that sync in writeoption defaults to false. Or because the GetApproximateSizes can only yield approximate values, which may be ignored for small values.

  std::vector<int> value_size{1, 1024};
  for(auto &p : value_size){
    EXPECT_TRUE(string->Set(key_, std::string(p, 'a')).ok());
    std::string got_value;
    EXPECT_TRUE(string->Get(key_,  &got_value).ok());
    EXPECT_EQ(got_value, std::string(p, 'a'));
    std::string value;
    string->Get(key_, &value);
    uint64_t result = 0;
    EXPECT_TRUE(disk->GetKeySize(key_, kRedisString, &result).ok());
    EXPECT_GE(result, p);
  }

tanruixiang avatar Sep 17 '22 16:09 tanruixiang

@PragmaTwice @git-hulk @torwig I noticed a problem. In the following tests. When we write a value with only one bit(like 'a'). Occasionally, the size of the key we get is 0. It occurs frequently when the value is very small. I guess the reason for this problem is that sync in writeoption defaults to false. Or because the GetApproximateSizes can only yield approximate values, which may be ignored for small values.

  std::vector<int> value_size{1, 1024};
  for(auto &p : value_size){
    EXPECT_TRUE(string->Set(key_, std::string(p, 'a')).ok());
    std::string got_value;
    EXPECT_TRUE(string->Get(key_,  &got_value).ok());
    EXPECT_EQ(got_value, std::string(p, 'a'));
    std::string value;
    string->Get(key_, &value);
    uint64_t result = 0;
    EXPECT_TRUE(disk->GetKeySize(key_, kRedisString, &result).ok());
    EXPECT_GE(result, p);
  }

@git-hulk Hi. Do you have time to look at this question? Because the approximate usage space is obtained, it may encounter a situation of 0 during testing. Increasing the sleep time also doesn't completely solve the problem. It seems that the sync method is not supported either.

tanruixiang avatar Sep 18 '22 13:09 tanruixiang

@PragmaTwice @git-hulk @torwig I noticed a problem. In the following tests. When we write a value with only one bit(like 'a'). Occasionally, the size of the key we get is 0. It occurs frequently when the value is very small. I guess the reason for this problem is that sync in writeoption defaults to false. Or because the GetApproximateSizes can only yield approximate values, which may be ignored for small values.

  std::vector<int> value_size{1, 1024};
  for(auto &p : value_size){
    EXPECT_TRUE(string->Set(key_, std::string(p, 'a')).ok());
    std::string got_value;
    EXPECT_TRUE(string->Get(key_,  &got_value).ok());
    EXPECT_EQ(got_value, std::string(p, 'a'));
    std::string value;
    string->Get(key_, &value);
    uint64_t result = 0;
    EXPECT_TRUE(disk->GetKeySize(key_, kRedisString, &result).ok());
    EXPECT_GE(result, p);
  }

@git-hulk Hi. Do you have time to look at this question? Because the approximate usage space is obtained, it may encounter a situation of 0 during testing. Increasing the sleep time also doesn't completely solve the problem. It seems that the sync method is not supported either.

Did you enable the include_memtabtles in SizeApproximationOptions?

git-hulk avatar Sep 18 '22 14:09 git-hulk

@PragmaTwice @git-hulk @torwig I noticed a problem. In the following tests. When we write a value with only one bit(like 'a'). Occasionally, the size of the key we get is 0. It occurs frequently when the value is very small. I guess the reason for this problem is that sync in writeoption defaults to false. Or because the GetApproximateSizes can only yield approximate values, which may be ignored for small values.

  std::vector<int> value_size{1, 1024};
  for(auto &p : value_size){
    EXPECT_TRUE(string->Set(key_, std::string(p, 'a')).ok());
    std::string got_value;
    EXPECT_TRUE(string->Get(key_,  &got_value).ok());
    EXPECT_EQ(got_value, std::string(p, 'a'));
    std::string value;
    string->Get(key_, &value);
    uint64_t result = 0;
    EXPECT_TRUE(disk->GetKeySize(key_, kRedisString, &result).ok());
    EXPECT_GE(result, p);
  }

@git-hulk Hi. Do you have time to look at this question? Because the approximate usage space is obtained, it may encounter a situation of 0 during testing. Increasing the sleep time also doesn't completely solve the problem. It seems that the sync method is not supported either.

Did you enable the include_memtabtles in SizeApproximationOptions?

Of course yes.

    this->option_.include_memtabtles = true;
    this->option_.include_files = true;

tanruixiang avatar Sep 18 '22 14:09 tanruixiang

Is it possible that it is a bug of rocksdb? I can get the value of the corresponding key, but the size occupied by the key is 0. And this problem occurs occasionally.

tanruixiang avatar Sep 18 '22 14:09 tanruixiang

@PragmaTwice @git-hulk @torwig I noticed a problem. In the following tests. When we write a value with only one bit(like 'a'). Occasionally, the size of the key we get is 0. It occurs frequently when the value is very small. I guess the reason for this problem is that sync in writeoption defaults to false. Or because the GetApproximateSizes can only yield approximate values, which may be ignored for small values.

  std::vector<int> value_size{1, 1024};
  for(auto &p : value_size){
    EXPECT_TRUE(string->Set(key_, std::string(p, 'a')).ok());
    std::string got_value;
    EXPECT_TRUE(string->Get(key_,  &got_value).ok());
    EXPECT_EQ(got_value, std::string(p, 'a'));
    std::string value;
    string->Get(key_, &value);
    uint64_t result = 0;
    EXPECT_TRUE(disk->GetKeySize(key_, kRedisString, &result).ok());
    EXPECT_GE(result, p);
  }

@git-hulk Hi. Do you have time to look at this question? Because the approximate usage space is obtained, it may encounter a situation of 0 during testing. Increasing the sleep time also doesn't completely solve the problem. It seems that the sync method is not supported either.

Did you enable the include_memtabtles in SizeApproximationOptions?

Of course yes.

    this->option_.include_memtabtles = true;
    this->option_.include_files = true;

OK, I need to take some time to inspect the reason. But I think it's ok to skip this corner case first since we should NOT care if the usage is small enough.

git-hulk avatar Sep 18 '22 14:09 git-hulk

@PragmaTwice @git-hulk @torwig I noticed a problem. In the following tests. When we write a value with only one bit(like 'a'). Occasionally, the size of the key we get is 0. It occurs frequently when the value is very small. I guess the reason for this problem is that sync in writeoption defaults to false. Or because the GetApproximateSizes can only yield approximate values, which may be ignored for small values.

  std::vector<int> value_size{1, 1024};
  for(auto &p : value_size){
    EXPECT_TRUE(string->Set(key_, std::string(p, 'a')).ok());
    std::string got_value;
    EXPECT_TRUE(string->Get(key_,  &got_value).ok());
    EXPECT_EQ(got_value, std::string(p, 'a'));
    std::string value;
    string->Get(key_, &value);
    uint64_t result = 0;
    EXPECT_TRUE(disk->GetKeySize(key_, kRedisString, &result).ok());
    EXPECT_GE(result, p);
  }

@git-hulk Hi. Do you have time to look at this question? Because the approximate usage space is obtained, it may encounter a situation of 0 during testing. Increasing the sleep time also doesn't completely solve the problem. It seems that the sync method is not supported either.

Did you enable the include_memtabtles in SizeApproximationOptions?

Of course yes.

    this->option_.include_memtabtles = true;
    this->option_.include_files = true;

OK, I need to take some time to inspect the reason. But I think it's ok to skip this corner case first since we should NOT care if the usage is small enough.

It's not only small cases where problems can arise. Sometimes a larger value also has a situation where the result is 0. For example, the failure of this CI has a relatively large value but get key size is zero. In this case ,we Lpush 1000 number value but get 0 size.big value but get size 0 case

tanruixiang avatar Sep 18 '22 14:09 tanruixiang

It's not only small cases where problems can arise. Sometimes a larger value also has a situation where the result is 0. For example, the failure of this CI has a relatively large value but get key size is zero. In this case ,we Lpush 1000 number value but get 0 size.big value but get size 0 case

OK, got it. I need to read this PR first and try to reproduce it on my side.

git-hulk avatar Sep 18 '22 15:09 git-hulk

OK, got it. I need to read this PR first and try to reproduce it on my side.

@git-hulk I used the parameters configured for GetApproximateSizes testing in rocksdb. However, there is still a probability of zero results. Do you have any suggestions?

tanruixiang avatar Sep 21 '22 15:09 tanruixiang

OK, got it. I need to read this PR first and try to reproduce it on my side.

@git-hulk I used the parameters configured for GetApproximateSizes testing in rocksdb. However, there is still a probability of zero results. Do you have any suggestions?

Sorry for missing this comment, I have no enough time to inspect this case yet.

git-hulk avatar Sep 22 '22 12:09 git-hulk

Sorry for missing this comment, I have no enough time to inspect this case yet.

@git-hulk Do not worry. I've been looking at this question. I found a strange phenomenon, if you get the corresponding key directly, you can find the value, but calling GetApproximateSizes will get 0. Maybe I'm almost sure what the reason is, please give me a little more time.(In general it can indeed be a problem with rocksdb.)

tanruixiang avatar Sep 23 '22 15:09 tanruixiang

@git-hulk I think I already know why. According to this, to prevent the situation of 0 size during the test, it is necessary to save the data in different data blocks of sst as much as possible. This requires us to enter a lot of data when testing.

tanruixiang avatar Sep 27 '22 04:09 tanruixiang

@git-hulk I think I already know why. According to this, to prevent the situation of 0 size during the test, it is necessary to save the data in different data blocks of sst as much as possible. This requires us to enter a lot of data when testing.

Nice!

git-hulk avatar Sep 27 '22 05:09 git-hulk

@tanruixiang Hi, I've done some tests locally. For example, if I set x to a string of length 50 and then use disk command - I get something about 80 as a result. Then I restart kvrocks and repeat disk command. And this time the result is something about 1500. Is it some kind of approximation by the rocksdb?

torwig avatar Oct 02 '22 18:10 torwig

@tanruixiang Hi, I've done some tests locally. For example, if I set x to a string of length 50 and then use disk command - I get something about 80 as a result. Then I restart kvrocks and repeat disk command. And this time the result is something about 1500. Is it some kind of approximation by the rocksdb?

Hi. I think it's as expected. It should be the reason for the statistical granularity of rocksdb. You can see in there.

tanruixiang avatar Oct 03 '22 01:10 tanruixiang

@git-hulk @torwig @caipengbo Hi. Do you have time to review?

tanruixiang avatar Oct 07 '22 13:10 tanruixiang

@git-hulk @torwig @caipengbo Hi. Do you have time to review?

Sure, I have read through the implementation, and overall is good to me. I'm still wondering if we have a more general way to reduce duplicate codes, I will approve this PR soon if can't figure out a better way.

git-hulk avatar Oct 07 '22 14:10 git-hulk

Sure, I have read through the implementation, and overall is good to me. I'm still wondering if we have a more general way to reduce duplicate codes, I will approve this PR soon if can't figure out a better way.

Do you have any ideas? It seems like it should be impossible to reduce each type of custom operation.

tanruixiang avatar Oct 08 '22 09:10 tanruixiang

Sure, I have read through the implementation, and overall is good to me. I'm still wondering if we have a more general way to reduce duplicate codes, I will approve this PR soon if can't figure out a better way.

Do you have any ideas? It seems like it should be impossible to reduce each type of custom operation.

The main issue is GetMetadata requires passing the type, so another way is to allow skipping type check in GetMetadata, but the current implementation is also fine to me. To see others have thoughts on this cc @ShooterIT @PragmaTwice @caipengbo @torwig

git-hulk avatar Oct 08 '22 09:10 git-hulk

@PragmaTwice @caipengbo @torwig Do you have any suggestion on this PR? Or I will merge it if we have no further comment.

git-hulk avatar Oct 11 '22 06:10 git-hulk

LGTM

caipengbo avatar Oct 11 '22 06:10 caipengbo

Thanks for @tanruixiang contribution and patient, also thanks to everyone who reviewed this PR. Merging...

git-hulk avatar Oct 11 '22 06:10 git-hulk

We also need to update our website :) @git-hulk

caipengbo avatar Oct 11 '22 06:10 caipengbo

Yes, @caipengbo you can help to update as well when you're free. https://github.com/apache/incubator-kvrocks-website/blob/main/docs/01-supported-commands.md

git-hulk avatar Oct 11 '22 06:10 git-hulk

Yes, @caipengbo you can help to update as well when you're free. https://github.com/apache/incubator-kvrocks-website/blob/main/docs/01-supported-commands.md

If I update the repository, will the site automatically update?

caipengbo avatar Oct 11 '22 07:10 caipengbo

We also need to update our website :) @git-hulk

Do you want to do it? If you don't have time, I can update the website later today.

tanruixiang avatar Oct 11 '22 07:10 tanruixiang

Yes, GitHub actions will deploy the update automatically.

git-hulk avatar Oct 11 '22 07:10 git-hulk

We also need to update our website :) @git-hulk

Do you want to do it? If you don't have time, I can update the website later today.

Cool, thanks. @tanruixiang

git-hulk avatar Oct 11 '22 07:10 git-hulk

Yes, you can do it :) @tanruixiang

caipengbo avatar Oct 11 '22 07:10 caipengbo