geocoding icon indicating copy to clipboard operation
geocoding copied to clipboard

关于地址后期出现高级信息对标准化的影响

Open Borber opened this issue 2 years ago • 5 comments

去除后期出现的更高级的信息. 会大幅提升相似度, 作者大大能优化一些这种情况吗?

String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";

结果:

海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)
addr1 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=null, 
	roadNum=null, 
	buildingNum=A-32, 
	text=西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4号, 
	buildingNum=11#楼2单元203, 
	text=绿地城润园
)
加载扩展词典:dic/region.dic
加载扩展词典:dic/community.dic
加载扩展停止词典:dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
	doc1=Document(terms=[Term(灵山镇), Term(A), Term(32), Term(西片), Term(去), Term(旧), Term(改), Term(项目), Term(地块), Term(11#), Term(楼), Term(22203), Term(栋), Term(单元), Term(层), Term(号)], town=Term(灵山镇), village=null, road=null, roadNum=null, roadNumValue=0), 
	doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4), 
	terms=[io.patamon.geocoding.similarity.MatchedTerm@2cfb4a64], 
	similarity=0.4886777774252209
)

去除第二个海口市

String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";

结果

海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)
addr1 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4号, 
	buildingNum=A-32, 
	text=绿地城润园灵山西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4号, 
	buildingNum=11#楼2单元203, 
	text=绿地城润园
)
加载扩展词典:dic/region.dic
加载扩展词典:dic/community.dic
加载扩展停止词典:dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
	doc1=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(A), Term(32), Term(绿地城), Term(润园), Term(灵山), Term(西片), Term(去), Term(旧), Term(改), Term(项目), Term(地块), Term(11#), Term(楼), Term(22203), Term(栋), Term(单元), Term(层), Term(号)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4), 
	doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4), 
	terms=[io.patamon.geocoding.similarity.MatchedTerm@4b6995df, io.patamon.geocoding.similarity.MatchedTerm@2fc14f68, io.patamon.geocoding.similarity.MatchedTerm@591f989e, io.patamon.geocoding.similarity.MatchedTerm@66048bfd, io.patamon.geocoding.similarity.MatchedTerm@61443d8f], 
	similarity=0.7152705001057788
)

Borber avatar Jan 06 '22 03:01 Borber

卧槽这么复杂呢,应该是被第二个 海口市 干扰了。😂

IceMimosa avatar Jan 06 '22 03:01 IceMimosa

卧槽这么复杂呢,应该是被第二个 海口市 干扰了。😂

是的呀, 我上面去除了第二个相似度就比较高了

Borber avatar Jan 06 '22 03:01 Borber

好的,我有空看下能否优化。话说你那边是不是生成了国标的地址,能贡献进来不,不知道准确率如何。😏

IceMimosa avatar Jan 06 '22 03:01 IceMimosa

好的,我有空看下能否优化。话说你那边是不是生成了国标的地址,能贡献进来不,不知道准确率如何。😏

国标感觉也没精确多少, 我测试起来感觉差不多

Borber avatar Jan 06 '22 03:01 Borber

@Borber 好的,主要是担心有些新增地址和旧地址变更,可能需要拿库里面的地址一起做个对比才能看出来。

IceMimosa avatar Jan 06 '22 03:01 IceMimosa