havenask icon indicating copy to clipboard operation
havenask copied to clipboard

本地单机用bs重新构建索引后怎么更新searcher?

Open supercocoa7654 opened this issue 1 year ago • 5 comments

本地单机用bs重新构建索引后怎么更新searcher?是不是必须得杀掉之前的进程?看了下有个 local_search_update.py 这个文件但执行报错: bash-4.4# /usr/bin/python /ha3_install/usr/local/lib/python/site-packages/ha_tools/local_search_update.py --disableTurbojet -i ./runtimedata/ -c /home/admin/havenask/example/cases/normal/config/online_config -b /ha3_install -p 12000 /home/admin/havenask/example/cases/normal/workdir/local_search_12000 /home/admin/havenask/example/cases/normal/workdir/local_search_12000/ports 12345 57377 12345 get port listArray failed get port listArray failed

supercocoa7654 avatar Jun 25 '23 02:06 supercocoa7654

之前确实有问题,更新脚本随着版本迭代没有做适配,现在已经修改了 可以重新pull下代码,参考example里面新加的update的demo,用新镜像重新创建容器试下 镜像地址:registry.cn-hangzhou.aliyuncs.com/havenask/ha3_runtime:1.0.0-beta1 image

dyuyang avatar Jun 29 '23 06:06 dyuyang

没太明白,文档里能否详细补充下增删改查分别怎么操作,我现在是这么操作

#构建索引
python build_demo_data.py /ha3_install normal

#启动服务
python start_demo_searcher.py /ha3_install normal 12345

然后手动修改test.data文件,增加update_field和delete等CMD,然后

#重新构建索引
python build_demo_data.py /ha3_install normal

#更新服务
python update_demo_searcher.py /ha3_install normal cases/normal/config/online_config

然后就报错了

[Errno 2] No such file or directory: '/havenask/example/cases/normal/workdir/qrs_port'
get port listArray failed
get port listArray failed
bash-4.4# find . -name qrs_port

问题就是

  1. 这里操作是否正确,第二次构建索引是不是增量构建?目前看是直接覆盖了之前的?
  2. 只改数据没改配置,怎么把新的索引更新到服务,目前执行update_demo_searcher.py报错 @dyuyang

supercocoa7654 avatar Jul 19 '23 03:07 supercocoa7654

每次构建的都是全量,如果需要实时更新,可以对接kafka,参考下kafka的那个例子

xuxijie avatar Jul 20 '23 01:07 xuxijie

现在还不支持批次增量更新

xuxijie avatar Jul 20 '23 01:07 xuxijie

使用example里的nomal数据 step1: 全量构建索引 + 起服务

mkdir -p  /havenask/example/cases/normal/workdir
cd /havenask/example/cases/normal/workdir && /usr/bin/python /ha3_install/usr/local/bin/bs startjob -c /havenask/example/cases/normal/config/offline_config -n in0 -j local -m full -d /havenask/example/data/test.data -w /havenask/example/cases/normal/workdir -i ./runtimedata -p 1 --documentformat=ha3


mkdir -p  /havenask/example/cases/normal/search_workdir
cd /havenask/example/cases/normal/search_workdir && /usr/bin/python /ha3_install/usr/local/lib/python/site-packages/havenask_tools/local_search_starter.py --disableTurbojet -i /havenask/example/cases/normal/workdir/runtimedata/ -c /havenask/example/cases/normal/config/online_config -p 30468,30480 -b /ha3_install --qrsHttpArpcBindPort 12345

step2: 查询数据:正常

/usr/bin/python /havenask/example/common/curl_http.py 12345 "select * from in0 &&kvpair=databaseName:in0;timeout:2000"
curl 'http://127.0.0.1:12345/sql?select%20%2A%20from%20in0%20%26%26kvpair%3DdatabaseName%3Ain0%3Btimeout%3A2000' 
USE_TIME: 1043.598ms, ROW_COUNT: 5, HAS_SOFT_FAILURE: 0, COVERAGE: 1
LEADER_INFO: in0:0;

------------------------------- TABLE INFO ---------------------------
total:5, rows:[0, 5), cols:4
          title (multi_char) |                 id (uint32) |               hits (uint32) |         createtime (uint64) |
                        null |                           1 |                           1 |                  1628584172 |
                        null |                           2 |                          32 |                  1631262572 |
                        null |                           3 |                          49 |                           0 |
                        null |                           5 |                          63 |                  1618043372 |
                        null |                           4 |                          75 |                  1623313772 |

------------------------------- PLAN INFO ---------------------------
SQL QUERY:
select * from in0 &&kvpair=databaseName:in0;timeout:2000
IQUAN_RESULT:
{"error_code":0,"error_message":"","result":{"rel_plan_version":"","rel_plan":[],"exec_params":{}}}

------------------------------- TRACE INFO ---------------------------

step3: 不改动数据,再次在另外一个目录全量构建索引,然后更新索引,显示更新正常

mkdir -p  /havenask/example/cases/normal/workdir2
cd /havenask/example/cases/normal/workdir2 && /usr/bin/python /ha3_install/usr/local/bin/bs startjob -c /havenask/example/cases/normal/config/offline_config -n in0 -j local -m full -d /havenask/example/data/test.data -w /havenask/example/cases/normal/workdir2 -i ./runtimedata -p 1 --documentformat=ha3


cd  /havenask/example/cases/normal/search_workdir && /usr/bin/python /ha3_install/usr/local/lib/python/site-packages/havenask_tools/local_search_update.py --disableTurbojet -i /havenask/example/cases/normal/workdir2/runtimedata -c /havenask/example/cases/normal/config/online_config -p 30468,30480 -b /ha3_install --qrsHttpArpcBindPort 12345

request signature {"table_info": {"in0": {"0": {"table_mode": 0, "config_path": "/havenask/example/cases/normal/config/online_config/table/0", "table_type": 3, "index_root": "/havenask/example/cases/normal/search_workdir/local_search_12000/in0_p0_r0/runtimedata", "total_partition_count": 1, "partitions": {"0_65535": {"inc_version": 1}}}}}, "biz_info": {"default": {"config_path": "/havenask/example/cases/normal/config/online_config/bizs/default/0"}}, "service_info": {"zone_name": "in0", "version": 0, "part_count": 1, "part_id": 0}, "clean_disk": false}
response signature {"biz_info": {"default": {"config_path": "/havenask/example/cases/normal/config/online_config/bizs/default/0"}}, "target_version": 1651870394, "table_info": {"in0": {"0": {"table_mode": 0, "config_path": "/havenask/example/cases/normal/config/online_config/table/0", "table_type": 3, "index_root": "/havenask/example/cases/normal/search_workdir/local_search_12000/in0_p0_r0/runtimedata", "total_partition_count": 1, "partitions": {"0_65535": {"inc_version": 1}}}}}, "clean_disk": false, "service_info": {"zone_name": "in0", "version": 0, "part_count": 1, "part_id": 0}, "catalog_address": "192.168.65.4:51471"}
searcher [in0_0] is ready for search, topo [tcp:192.168.65.4:58601#in0.default:1:0:648843349:100:-1894657943:41401:false:41401|in0.default.search:1:0:648843349:100:-1894657943:41401:false:41401|in0.default_agg:1:0:648843349:100:-1894657943:41401:false:41401|in0.default_agg.search:1:0:648843349:100:-1894657943:41401:false:41401|in0.default_sql:1:0:-2:100:-1894657943:41401:false:41401|in0.para_search_2:1:0:648843349:100:-1894657943:41401:false:41401|in0.para_search_2.search:1:0:648843349:100:-1894657943:41401:false:41401|in0.para_search_4:1:0:648843349:100:-1894657943:41401:false:41401|in0.para_search_4.search:1:0:648843349:100:-1894657943:41401:false:41401|]
all searcher is ready.
qrs is ready for search, http port 12345, arpc port 51471
update success

step4: 查询数据,失败报错

/usr/bin/python /havenask/example/common/curl_http.py 12345 "select * from in0 &&kvpair=databaseName:in0;timeout:2000"
curl 'http://127.0.0.1:12345/sql?select%20%2A%20from%20in0%20%26%26kvpair%3DdatabaseName%3Ain0%3Btimeout%3A2000' 
USE_TIME: 18.946ms, ROW_COUNT: 0, HAS_SOFT_FAILURE: 0, COVERAGE: 1
run graph failed. result has error [EC_ABORT], error msg [get biz full part count failed [in0.default_sql]]

------------------------------- PLAN INFO ---------------------------
SQL QUERY:
select * from in0 &&kvpair=databaseName:in0;timeout:2000
IQUAN_RESULT:
{"error_code":0,"error_message":"","result":{"rel_plan_version":"","rel_plan":[],"exec_params":{}}}

------------------------------- TRACE INFO ---------------------------

这个是更新索引还是不行,是哪里有问题吗 @xuxijie @dyuyang

supercocoa7654 avatar Aug 01 '23 08:08 supercocoa7654