geocoding
geocoding copied to clipboard
关于地址后期出现高级信息对标准化的影响
去除后期出现的更高级的信息. 会大幅提升相似度, 作者大大能优化一些这种情况吗?
String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";
结果:
海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)
addr1 >>>> Address(
provinceId=460000000000, province=海南省,
cityId=460100000000, city=海口市,
districtId=460108000000, district=美兰区,
streetId=460108101000, street=灵山镇,
townId=460108101000, town=灵山镇,
villageId=null, village=null,
road=null,
roadNum=null,
buildingNum=A-32,
text=西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
provinceId=460000000000, province=海南省,
cityId=460100000000, city=海口市,
districtId=460108000000, district=美兰区,
streetId=460108101000, street=灵山镇,
townId=460108101000, town=灵山镇,
villageId=null, village=null,
road=海榆大道,
roadNum=4号,
buildingNum=11#楼2单元203,
text=绿地城润园
)
加载扩展词典:dic/region.dic
加载扩展词典:dic/community.dic
加载扩展停止词典:dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
doc1=Document(terms=[Term(灵山镇), Term(A), Term(32), Term(西片), Term(去), Term(旧), Term(改), Term(项目), Term(地块), Term(11#), Term(楼), Term(22203), Term(栋), Term(单元), Term(层), Term(号)], town=Term(灵山镇), village=null, road=null, roadNum=null, roadNumValue=0),
doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4),
terms=[io.patamon.geocoding.similarity.MatchedTerm@2cfb4a64],
similarity=0.4886777774252209
)
去除第二个海口市
String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";
结果
海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)
addr1 >>>> Address(
provinceId=460000000000, province=海南省,
cityId=460100000000, city=海口市,
districtId=460108000000, district=美兰区,
streetId=460108101000, street=灵山镇,
townId=460108101000, town=灵山镇,
villageId=null, village=null,
road=海榆大道,
roadNum=4号,
buildingNum=A-32,
text=绿地城润园灵山西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
provinceId=460000000000, province=海南省,
cityId=460100000000, city=海口市,
districtId=460108000000, district=美兰区,
streetId=460108101000, street=灵山镇,
townId=460108101000, town=灵山镇,
villageId=null, village=null,
road=海榆大道,
roadNum=4号,
buildingNum=11#楼2单元203,
text=绿地城润园
)
加载扩展词典:dic/region.dic
加载扩展词典:dic/community.dic
加载扩展停止词典:dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
doc1=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(A), Term(32), Term(绿地城), Term(润园), Term(灵山), Term(西片), Term(去), Term(旧), Term(改), Term(项目), Term(地块), Term(11#), Term(楼), Term(22203), Term(栋), Term(单元), Term(层), Term(号)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4),
doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4),
terms=[io.patamon.geocoding.similarity.MatchedTerm@4b6995df, io.patamon.geocoding.similarity.MatchedTerm@2fc14f68, io.patamon.geocoding.similarity.MatchedTerm@591f989e, io.patamon.geocoding.similarity.MatchedTerm@66048bfd, io.patamon.geocoding.similarity.MatchedTerm@61443d8f],
similarity=0.7152705001057788
)
卧槽这么复杂呢,应该是被第二个 海口市
干扰了。😂
卧槽这么复杂呢,应该是被第二个
海口市
干扰了。😂
是的呀, 我上面去除了第二个相似度就比较高了
好的,我有空看下能否优化。话说你那边是不是生成了国标的地址,能贡献进来不,不知道准确率如何。😏
好的,我有空看下能否优化。话说你那边是不是生成了国标的地址,能贡献进来不,不知道准确率如何。😏
国标感觉也没精确多少, 我测试起来感觉差不多
@Borber 好的,主要是担心有些新增地址和旧地址变更,可能需要拿库里面的地址一起做个对比才能看出来。