milvus icon indicating copy to clipboard operation
milvus copied to clipboard

Add pebblekv module

Open caesarjuly opened this issue 2 years ago β€’ 16 comments

Issue #16957 Finish the first step of #16957. No real usage yet. I mainly referred to the previous PR and the API of Pebble

  • [x] Add PebbleDB to the kv module and write basic tests
  • [x] Add PebbleDB to mq module with the required tests
  • [x] Add PebbleDB to configs and other modules
  • [ ] Do an integration test and verify the performance
  • [ ] Remove RocksDB from the dependency

Questions need to be confirmed:

  • Pebble now only supports Go>=1.9
    • but the Go version in our default dev container is 1.8.3. This lead to an error /go/pkg/mod/github.com/cockroachdb/[email protected]/vfs/disk_full.go:100:20: undefined: atomic.Uint32 note: module requires Go 1.19
    • I tried manually upgrading the Go version to 1.20.4 in the container, and everything worked well.
    • Conclusion: we will wait for the Go version upgrade as discussed here
  • There are some functions never used in rocksdbkv, like the xxxBytes and xxxRemoveWithPrefix
    • In the previous PR , all of them are removed
    • In my PR, I only keep the xxxRemoveWithPrefix functions.

caesarjuly avatar May 18 '23 21:05 caesarjuly

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: caesarjuly To complete the pull request process, please assign xiaofan-luan after the PR has been reviewed. You can assign the PR to them by writing /assign @xiaofan-luan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot avatar May 18 '23 21:05 sre-ci-robot

@caesarjuly Please associate the related issue to the body of your Pull Request. (eg. β€œissue: #”)

mergify[bot] avatar May 18 '23 21:05 mergify[bot]

@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.

mergify[bot] avatar May 18 '23 21:05 mergify[bot]

We plan to try upgrading Go to version 1.20 after the release of 2.3.0.

jiaoew1991 avatar May 22 '23 06:05 jiaoew1991

We plan to try upgrading Go to version 1.20 after the release of 2.3.0.

What's the time plan? Should I wait until your release to continue the work? Or maybe I can try to submit my PR to your 2.3.0 branch?

caesarjuly avatar May 22 '23 16:05 caesarjuly

We plan to try upgrading Go to version 1.20 after the release of 2.3.0.

What's the time plan? Should I wait until your release to continue the work? Or maybe I can try to submit my PR to your 2.3.0 branch?

It is estimated that version 2.3 will be released in early next month, and the master branch will serve as the branch for version 2.4.0 at that time. It is expected to upgrade to Go 1.20 by late June.

jiaoew1991 avatar May 23 '23 02:05 jiaoew1991

We plan to try upgrading Go to version 1.20 after the release of 2.3.0.

What's the time plan? Should I wait until your release to continue the work? Or maybe I can try to submit my PR to your 2.3.0 branch?

It is estimated that version 2.3 will be released in early next month, and the master branch will serve as the branch for version 2.4.0 at that time. It is expected to upgrade to Go 1.20 by late June.

I see. Let's do this in parallel, and I will continue the work on the migration. I will create a local image for Go 1.20 and complete the remaining code on this image. Instead of my initial proposal of separating code into several small PRs, let me put them into several commits first. I hope this can also be clear.

caesarjuly avatar May 23 '23 04:05 caesarjuly

@caesarjuly, please be sure the pr should only have one commit, check https://github.com/milvus-io/milvus/blob/master/CODE_REVIEW.md for more details.

mergify[bot] avatar May 26 '23 03:05 mergify[bot]

@caesarjuly Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco.

mergify[bot] avatar Jun 07 '23 03:06 mergify[bot]

@jiaoew1991 Hi, I have finished most of the code. Currently, I keep the rocksmq and pebblemq code at the same time. And all the unit tests pass. But when I start to do the integration test in Docker. There is an error related to kubeconfig loading. I'm using the cmds from here. Here is the error.

utils/util_k8s.py:102: in get_pod_list
    init_k8s_client_config()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def init_k8s_client_config():
        """
        init kubernetes client config
        """
        try:
            in_cluster = os.getenv(in_cluster_env, default='False')
            # log.debug(f"env variable IN_CLUSTER: {in_cluster}")
            if in_cluster.lower() == 'true':
                config.load_incluster_config()
            else:
                config.load_kube_config()
        except Exception as e:
>           raise Exception(e)
E           Exception: Invalid kube-config file. No configuration found.

Could you help me solve this? We are getting close to finishing this task. 🎯

caesarjuly avatar Jun 07 '23 03:06 caesarjuly

@jiaoew1991 Hi, I have finished most of the code. Currently, I keep the rocksmq and pebblemq code at the same time. And all the unit tests pass. But when I start to do the integration test in Docker. There is an error related to kubeconfig loading. I'm using the cmds from here. Here is the error.

utils/util_k8s.py:102: in get_pod_list
    init_k8s_client_config()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def init_k8s_client_config():
        """
        init kubernetes client config
        """
        try:
            in_cluster = os.getenv(in_cluster_env, default='False')
            # log.debug(f"env variable IN_CLUSTER: {in_cluster}")
            if in_cluster.lower() == 'true':
                config.load_incluster_config()
            else:
                config.load_kube_config()
        except Exception as e:
>           raise Exception(e)
E           Exception: Invalid kube-config file. No configuration found.

Could you help me solve this? We are getting close to finishing this task. 🎯

If you want to run e2e tests locally, I suggest using the docker-compose mode to start Milvus. Using k8s requires deploying a local k8s cluster which consumes more resources.

Based on your error message, it seems that the issue is caused by the lack of k8s configuration πŸ˜…

jiaoew1991 avatar Jun 08 '23 01:06 jiaoew1991

Thanks @jiaoew1991 . I'm following the instructions here. I use these cmds to start up Milvus.

cd deployments/docker/dev
docker-compose up -d
cd ../../../
build/builder.sh /bin/bash -c "export ROCKSMQ_PATH='/tmp/milvus/rdb_data' && ./scripts/start_standalone.sh && cat"

And use these cmds to run the tests

MILVUS_SERVICE_IP=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker-compose ps -q builder))
cd tests/docker
docker-compose run --rm pytest /bin/bash -c "pytest --host ${MILVUS_SERVICE_IP}"

Looks like these cmds are using the docker-compose mode? Is there any other cmds I can use?

By the way, the error only happened in the pytest docker container when I started running the E2E test cases, not in the Milvus docker container. Here is the complete error message

bulk_insert/test_bulk_insert.py:59: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
utils/util_k8s.py:188: in get_milvus_instance_name
    ip_name_pairs = get_pod_ip_name_pairs(namespace, "app.kubernetes.io/name=milvus")
utils/util_k8s.py:126: in get_pod_ip_name_pairs
    items = get_pod_list(namespace, label_selector)
utils/util_k8s.py:102: in get_pod_list
    init_k8s_client_config()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def init_k8s_client_config():
        """
        init kubernetes client config
        """
        try:
            in_cluster = os.getenv(in_cluster_env, default='False')
            # log.debug(f"env variable IN_CLUSTER: {in_cluster}")
            if in_cluster.lower() == 'true':
                config.load_incluster_config()
            else:
                config.load_kube_config()
        except Exception as e:
>           raise Exception(e)
E           Exception: Invalid kube-config file. No configuration found.

I dive a little bit further. test_bulk_xxx and test_chaos_xxx test cases will call the get_milvus_instance_name function. And it will invoke init_k8s_client_config function which requires a k8s config file.

caesarjuly avatar Jun 08 '23 02:06 caesarjuly

Thanks @jiaoew1991 . I'm following the instructions here. I use these cmds to start up Milvus.

cd deployments/docker/dev
docker-compose up -d
cd ../../../
build/builder.sh /bin/bash -c "export ROCKSMQ_PATH='/tmp/milvus/rdb_data' && ./scripts/start_standalone.sh && cat"

And use these cmds to run the tests

MILVUS_SERVICE_IP=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker-compose ps -q builder))
cd tests/docker
docker-compose run --rm pytest /bin/bash -c "pytest --host ${MILVUS_SERVICE_IP}"

Looks like these cmds are using the docker-compose mode? Is there any other cmds I can use?

By the way, the error only happened in the pytest docker container when I started running the E2E test cases, not in the Milvus docker container. Here is the complete error message

bulk_insert/test_bulk_insert.py:59: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
utils/util_k8s.py:188: in get_milvus_instance_name
    ip_name_pairs = get_pod_ip_name_pairs(namespace, "app.kubernetes.io/name=milvus")
utils/util_k8s.py:126: in get_pod_ip_name_pairs
    items = get_pod_list(namespace, label_selector)
utils/util_k8s.py:102: in get_pod_list
    init_k8s_client_config()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def init_k8s_client_config():
        """
        init kubernetes client config
        """
        try:
            in_cluster = os.getenv(in_cluster_env, default='False')
            # log.debug(f"env variable IN_CLUSTER: {in_cluster}")
            if in_cluster.lower() == 'true':
                config.load_incluster_config()
            else:
                config.load_kube_config()
        except Exception as e:
>           raise Exception(e)
E           Exception: Invalid kube-config file. No configuration found.

I dive a little bit further. test_bulk_xxx and test_chaos_xxx test cases will call the get_milvus_instance_name function. And it will invoke init_k8s_client_config function which requires a k8s config file.

I got it. The command to start Milvus is fine. It's just that the document is outdated. If you want to run e2e tests locally, use the following command instead. The original command will run all test cases, but for submitting a PR, only running regression tests at L0 and L1 levels is required.

docker-compose run --rm pytest /bin/bash -c "pytest --host ${MILVUS_SERVICE_IP} tests/python_client/testcases/ --tags L0 L1 "

jiaoew1991 avatar Jun 08 '23 06:06 jiaoew1991

However, you only need to update the PR to trigger the automatic running of e2e tasks in CI for testing. There is no need to run it locally unless debugging or performance testing is required πŸ˜„

jiaoew1991 avatar Jun 08 '23 06:06 jiaoew1991

@caesarjuly E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Jun 08 '23 16:06 mergify[bot]

@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.

mergify[bot] avatar Jun 08 '23 16:06 mergify[bot]

@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.

mergify[bot] avatar Jun 08 '23 20:06 mergify[bot]

@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.

mergify[bot] avatar Jun 08 '23 21:06 mergify[bot]

@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.

mergify[bot] avatar Jun 08 '23 23:06 mergify[bot]

@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.

mergify[bot] avatar Jun 08 '23 23:06 mergify[bot]

@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.

mergify[bot] avatar Jun 09 '23 00:06 mergify[bot]

@caesarjuly E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Jun 09 '23 00:06 mergify[bot]

@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.

mergify[bot] avatar Jun 09 '23 00:06 mergify[bot]

@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.

mergify[bot] avatar Jun 09 '23 01:06 mergify[bot]

@caesarjuly E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Jun 09 '23 01:06 mergify[bot]

@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.

mergify[bot] avatar Jun 09 '23 01:06 mergify[bot]

However, you only need to update the PR to trigger the automatic running of e2e tasks in CI for testing. There is no need to run it locally unless debugging or performance testing is required smile

Thanks. This does work. I noticed after I rebased the latest code from the master branch, the default Golang version for dev docker is already 1.20.4. But in the CI, the Golang is still 1.8, right? Some dependency for Pebble still not exists.

I also found 2 very tricky issues while doing local testing. They are all related to our global singleton instance of mq object.

  1. In the mqwrapper directory, the pmq_client_test and pmq_msgstream_test both initialize and close the global pebblemq server object. Sometimes they will conflict with each other and trigger a nil reference panic when closing the global server object. The error will definitely occur if they are close to each other in the folder. And this issue can be kind solved by putting these two files far away from each other. I renamed the pebblemq_msgstream_test.go to pmq_msgstream_test.go so that these two files are not together, then the error disappears. I also tried to run the tests in sequence by add -p 1 param to go test, but it didn't help
  2. Actually, in the original rocksdb-based rmq module, we have the same issue. If we put the corresponding files close to each other, we will get the same error. ( I already verified)
  3. Another error(just an FATAL log, not panic) is related to the nmq server module which I didn't change anything in the PR. if I run the tests under it. There will be an error like I listed below. However, this issue can be solved by add -p 1 param and run the test cmd source scripts/setenv.sh && go test -v ./pkg/mq/msgstream/... -count 1 -p 1
[2023/06/09 01:14:53.815 +00:00] [FATAL] [nmq/nmq_server.go:58] ["nmq is not ready"] [stack="github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq.MustInitNatsMQ.func1\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq/nmq_server.go:58\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:74\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:65\ngithub.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq.MustInitNatsMQ\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq/nmq_server.go:45\ngithub.com/milvus-io/milvus/pkg/mq/msgstream.TestNmq\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/factory_test.go:42\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:1576"]
FAIL    github.com/milvus-io/milvus/pkg/mq/msgstream    4.046s

Do you have any idea about this? I hope I didn't bring any bug here.πŸ€”

caesarjuly avatar Jun 09 '23 01:06 caesarjuly

@caesarjuly E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Jun 09 '23 01:06 mergify[bot]

@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.

mergify[bot] avatar Jun 09 '23 02:06 mergify[bot]

However, you only need to update the PR to trigger the automatic running of e2e tasks in CI for testing. There is no need to run it locally unless debugging or performance testing is required smile

Thanks. This does work. I noticed after I rebased the latest code from the master branch, the default Golang version for dev docker is already 1.20.4. But in the CI, the Golang is still 1.8, right? Some dependency for Pebble still not exists.

I also found 2 very tricky issues while doing local testing. They are all related to our global singleton instance of mq object.

  1. In the mqwrapper directory, the pmq_client_test and pmq_msgstream_test both initialize and close the global pebblemq server object. Sometimes they will conflict with each other and trigger a nil reference panic when closing the global server object. The error will definitely occur if they are close to each other in the folder. And this issue can be kind solved by putting these two files far away from each other. I renamed the pebblemq_msgstream_test.go to pmq_msgstream_test.go so that these two files are not together, then the error disappears. I also tried to run the tests in sequence by add -p 1 param to go test, but it didn't help
  2. Actually, in the original rocksdb-based rmq module, we have the same issue. If we put the corresponding files close to each other, we will get the same error. ( I already verified)
  3. Another error(just an FATAL log, not panic) is related to the nmq server module which I didn't change anything in the PR. if I run the tests under it. There will be an error like I listed below. However, this issue can be solved by add -p 1 param and run the test cmd source scripts/setenv.sh && go test -v ./pkg/mq/msgstream/... -count 1 -p 1
[2023/06/09 01:14:53.815 +00:00] [FATAL] [nmq/nmq_server.go:58] ["nmq is not ready"] [stack="github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq.MustInitNatsMQ.func1\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq/nmq_server.go:58\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:74\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:65\ngithub.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq.MustInitNatsMQ\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq/nmq_server.go:45\ngithub.com/milvus-io/milvus/pkg/mq/msgstream.TestNmq\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/factory_test.go:42\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:1576"]
FAIL    github.com/milvus-io/milvus/pkg/mq/msgstream    4.046s

Do you have any idea about this? I hope I didn't bring any bug here.πŸ€”

For issues 1 and 2, would you be able to submit a pull request to fix them? It seems that the test case design was not good and the resource management was not done properly. If it's not convenient for you, please open an issue so we can handle it.

Regarding issue 3, do you have any suggestions @chyezh ?

jiaoew1991 avatar Jun 12 '23 02:06 jiaoew1991