milvus
milvus copied to clipboard
Add pebblekv module
Issue #16957 Finish the first step of #16957. No real usage yet. I mainly referred to the previous PR and the API of Pebble
- [x] Add PebbleDB to the kv module and write basic tests
- [x] Add PebbleDB to mq module with the required tests
- [x] Add PebbleDB to configs and other modules
- [ ] Do an integration test and verify the performance
- [ ] Remove RocksDB from the dependency
Questions need to be confirmed:
- Pebble now only supports Go>=1.9
- but the Go version in our default dev container is 1.8.3. This lead to an error
/go/pkg/mod/github.com/cockroachdb/[email protected]/vfs/disk_full.go:100:20: undefined: atomic.Uint32 note: module requires Go 1.19 - I tried manually upgrading the Go version to 1.20.4 in the container, and everything worked well.
- Conclusion: we will wait for the Go version upgrade as discussed here
- but the Go version in our default dev container is 1.8.3. This lead to an error
- There are some functions never used in rocksdbkv, like the
xxxBytesandxxxRemoveWithPrefix- In the previous PR , all of them are removed
- In my PR, I only keep the
xxxRemoveWithPrefixfunctions.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: caesarjuly
To complete the pull request process, please assign xiaofan-luan after the PR has been reviewed.
You can assign the PR to them by writing /assign @xiaofan-luan in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@caesarjuly Please associate the related issue to the body of your Pull Request. (eg. βissue: #
@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.
We plan to try upgrading Go to version 1.20 after the release of 2.3.0.
We plan to try upgrading Go to version 1.20 after the release of 2.3.0.
What's the time plan? Should I wait until your release to continue the work? Or maybe I can try to submit my PR to your 2.3.0 branch?
We plan to try upgrading Go to version 1.20 after the release of 2.3.0.
What's the time plan? Should I wait until your release to continue the work? Or maybe I can try to submit my PR to your 2.3.0 branch?
It is estimated that version 2.3 will be released in early next month, and the master branch will serve as the branch for version 2.4.0 at that time. It is expected to upgrade to Go 1.20 by late June.
We plan to try upgrading Go to version 1.20 after the release of 2.3.0.
What's the time plan? Should I wait until your release to continue the work? Or maybe I can try to submit my PR to your 2.3.0 branch?
It is estimated that version 2.3 will be released in early next month, and the master branch will serve as the branch for version 2.4.0 at that time. It is expected to upgrade to Go 1.20 by late June.
I see. Let's do this in parallel, and I will continue the work on the migration. I will create a local image for Go 1.20 and complete the remaining code on this image. Instead of my initial proposal of separating code into several small PRs, let me put them into several commits first. I hope this can also be clear.
@caesarjuly, please be sure the pr should only have one commit, check https://github.com/milvus-io/milvus/blob/master/CODE_REVIEW.md for more details.
@caesarjuly Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco.
@jiaoew1991 Hi, I have finished most of the code. Currently, I keep the rocksmq and pebblemq code at the same time. And all the unit tests pass. But when I start to do the integration test in Docker. There is an error related to kubeconfig loading. I'm using the cmds from here. Here is the error.
utils/util_k8s.py:102: in get_pod_list
init_k8s_client_config()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def init_k8s_client_config():
"""
init kubernetes client config
"""
try:
in_cluster = os.getenv(in_cluster_env, default='False')
# log.debug(f"env variable IN_CLUSTER: {in_cluster}")
if in_cluster.lower() == 'true':
config.load_incluster_config()
else:
config.load_kube_config()
except Exception as e:
> raise Exception(e)
E Exception: Invalid kube-config file. No configuration found.
Could you help me solve this? We are getting close to finishing this task. π―
@jiaoew1991 Hi, I have finished most of the code. Currently, I keep the rocksmq and pebblemq code at the same time. And all the unit tests pass. But when I start to do the integration test in Docker. There is an error related to kubeconfig loading. I'm using the cmds from here. Here is the error.
utils/util_k8s.py:102: in get_pod_list init_k8s_client_config() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ def init_k8s_client_config(): """ init kubernetes client config """ try: in_cluster = os.getenv(in_cluster_env, default='False') # log.debug(f"env variable IN_CLUSTER: {in_cluster}") if in_cluster.lower() == 'true': config.load_incluster_config() else: config.load_kube_config() except Exception as e: > raise Exception(e) E Exception: Invalid kube-config file. No configuration found.Could you help me solve this? We are getting close to finishing this task. π―
If you want to run e2e tests locally, I suggest using the docker-compose mode to start Milvus. Using k8s requires deploying a local k8s cluster which consumes more resources.
Based on your error message, it seems that the issue is caused by the lack of k8s configuration π
Thanks @jiaoew1991 . I'm following the instructions here. I use these cmds to start up Milvus.
cd deployments/docker/dev
docker-compose up -d
cd ../../../
build/builder.sh /bin/bash -c "export ROCKSMQ_PATH='/tmp/milvus/rdb_data' && ./scripts/start_standalone.sh && cat"
And use these cmds to run the tests
MILVUS_SERVICE_IP=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker-compose ps -q builder))
cd tests/docker
docker-compose run --rm pytest /bin/bash -c "pytest --host ${MILVUS_SERVICE_IP}"
Looks like these cmds are using the docker-compose mode? Is there any other cmds I can use?
By the way, the error only happened in the pytest docker container when I started running the E2E test cases, not in the Milvus docker container. Here is the complete error message
bulk_insert/test_bulk_insert.py:59:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
utils/util_k8s.py:188: in get_milvus_instance_name
ip_name_pairs = get_pod_ip_name_pairs(namespace, "app.kubernetes.io/name=milvus")
utils/util_k8s.py:126: in get_pod_ip_name_pairs
items = get_pod_list(namespace, label_selector)
utils/util_k8s.py:102: in get_pod_list
init_k8s_client_config()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def init_k8s_client_config():
"""
init kubernetes client config
"""
try:
in_cluster = os.getenv(in_cluster_env, default='False')
# log.debug(f"env variable IN_CLUSTER: {in_cluster}")
if in_cluster.lower() == 'true':
config.load_incluster_config()
else:
config.load_kube_config()
except Exception as e:
> raise Exception(e)
E Exception: Invalid kube-config file. No configuration found.
I dive a little bit further. test_bulk_xxx and test_chaos_xxx test cases will call the get_milvus_instance_name function. And it will invoke init_k8s_client_config function which requires a k8s config file.
Thanks @jiaoew1991 . I'm following the instructions here. I use these cmds to start up Milvus.
cd deployments/docker/dev docker-compose up -d cd ../../../ build/builder.sh /bin/bash -c "export ROCKSMQ_PATH='/tmp/milvus/rdb_data' && ./scripts/start_standalone.sh && cat"And use these cmds to run the tests
MILVUS_SERVICE_IP=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker-compose ps -q builder)) cd tests/docker docker-compose run --rm pytest /bin/bash -c "pytest --host ${MILVUS_SERVICE_IP}"Looks like these cmds are
using the docker-compose mode? Is there any other cmds I can use?By the way, the error only happened in the pytest docker container when I started running the E2E test cases, not in the Milvus docker container. Here is the complete error message
bulk_insert/test_bulk_insert.py:59: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ utils/util_k8s.py:188: in get_milvus_instance_name ip_name_pairs = get_pod_ip_name_pairs(namespace, "app.kubernetes.io/name=milvus") utils/util_k8s.py:126: in get_pod_ip_name_pairs items = get_pod_list(namespace, label_selector) utils/util_k8s.py:102: in get_pod_list init_k8s_client_config() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ def init_k8s_client_config(): """ init kubernetes client config """ try: in_cluster = os.getenv(in_cluster_env, default='False') # log.debug(f"env variable IN_CLUSTER: {in_cluster}") if in_cluster.lower() == 'true': config.load_incluster_config() else: config.load_kube_config() except Exception as e: > raise Exception(e) E Exception: Invalid kube-config file. No configuration found.I dive a little bit further. test_bulk_xxx and test_chaos_xxx test cases will call the get_milvus_instance_name function. And it will invoke init_k8s_client_config function which requires a k8s config file.
I got it. The command to start Milvus is fine. It's just that the document is outdated. If you want to run e2e tests locally, use the following command instead. The original command will run all test cases, but for submitting a PR, only running regression tests at L0 and L1 levels is required.
docker-compose run --rm pytest /bin/bash -c "pytest --host ${MILVUS_SERVICE_IP} tests/python_client/testcases/ --tags L0 L1 "
However, you only need to update the PR to trigger the automatic running of e2e tasks in CI for testing. There is no need to run it locally unless debugging or performance testing is required π
@caesarjuly E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.
@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.
@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.
@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.
@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.
@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.
@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.
@caesarjuly E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.
@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.
@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.
@caesarjuly E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.
@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.
However, you only need to update the PR to trigger the automatic running of e2e tasks in CI for testing. There is no need to run it locally unless debugging or performance testing is required smile
Thanks. This does work. I noticed after I rebased the latest code from the master branch, the default Golang version for dev docker is already 1.20.4. But in the CI, the Golang is still 1.8, right? Some dependency for Pebble still not exists.
I also found 2 very tricky issues while doing local testing. They are all related to our global singleton instance of mq object.
- In the mqwrapper directory, the pmq_client_test and pmq_msgstream_test both initialize and close the global pebblemq server object. Sometimes they will conflict with each other and trigger a nil reference panic when closing the global server object. The error will definitely occur if they are close to each other in the folder. And this issue can be kind solved by putting these two files far away from each other. I renamed the pebblemq_msgstream_test.go to pmq_msgstream_test.go so that these two files are not together, then the error disappears. I also tried to run the tests in sequence by add
-p 1param togo test, but it didn't help - Actually, in the original rocksdb-based rmq module, we have the same issue. If we put the corresponding files close to each other, we will get the same error. ( I already verified)
- Another error(just an FATAL log, not panic) is related to the nmq server module which I didn't change anything in the PR. if I run the tests under it. There will be an error like I listed below. However, this issue can be solved by add
-p 1param and run the test cmdsource scripts/setenv.sh && go test -v ./pkg/mq/msgstream/... -count 1 -p 1
[2023/06/09 01:14:53.815 +00:00] [FATAL] [nmq/nmq_server.go:58] ["nmq is not ready"] [stack="github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq.MustInitNatsMQ.func1\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq/nmq_server.go:58\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:74\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:65\ngithub.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq.MustInitNatsMQ\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq/nmq_server.go:45\ngithub.com/milvus-io/milvus/pkg/mq/msgstream.TestNmq\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/factory_test.go:42\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:1576"]
FAIL github.com/milvus-io/milvus/pkg/mq/msgstream 4.046s
Do you have any idea about this? I hope I didn't bring any bug here.π€
@caesarjuly E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.
@caesarjuly ut workflow job failed, comment rerun ut can trigger the job again.
However, you only need to update the PR to trigger the automatic running of e2e tasks in CI for testing. There is no need to run it locally unless debugging or performance testing is required smile
Thanks. This does work. I noticed after I rebased the latest code from the master branch, the default Golang version for dev docker is already 1.20.4. But in the CI, the Golang is still 1.8, right? Some dependency for Pebble still not exists.
I also found 2 very tricky issues while doing local testing. They are all related to our global singleton instance of mq object.
- In the mqwrapper directory, the pmq_client_test and pmq_msgstream_test both initialize and close the global pebblemq server object. Sometimes they will conflict with each other and trigger a nil reference panic when closing the global server object. The error will definitely occur if they are close to each other in the folder. And this issue can be kind solved by putting these two files far away from each other. I renamed the pebblemq_msgstream_test.go to pmq_msgstream_test.go so that these two files are not together, then the error disappears. I also tried to run the tests in sequence by add
-p 1param togo test, but it didn't help- Actually, in the original rocksdb-based rmq module, we have the same issue. If we put the corresponding files close to each other, we will get the same error. ( I already verified)
- Another error(just an FATAL log, not panic) is related to the nmq server module which I didn't change anything in the PR. if I run the tests under it. There will be an error like I listed below. However, this issue can be solved by add
-p 1param and run the test cmdsource scripts/setenv.sh && go test -v ./pkg/mq/msgstream/... -count 1 -p 1[2023/06/09 01:14:53.815 +00:00] [FATAL] [nmq/nmq_server.go:58] ["nmq is not ready"] [stack="github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq.MustInitNatsMQ.func1\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq/nmq_server.go:58\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:74\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:65\ngithub.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq.MustInitNatsMQ\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/mqwrapper/nmq/nmq_server.go:45\ngithub.com/milvus-io/milvus/pkg/mq/msgstream.TestNmq\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgstream/factory_test.go:42\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:1576"] FAIL github.com/milvus-io/milvus/pkg/mq/msgstream 4.046sDo you have any idea about this? I hope I didn't bring any bug here.π€
For issues 1 and 2, would you be able to submit a pull request to fix them? It seems that the test case design was not good and the resource management was not done properly. If it's not convenient for you, please open an issue so we can handle it.
Regarding issue 3, do you have any suggestions @chyezh ?