areacodes icon indicating copy to clipboard operation
areacodes copied to clipboard

Unnecessary diff?

Open yangtze-shi opened this issue 5 months ago • 4 comments

Thank you for your contribution. This is important work and certainly deserves funding.

I have a question regarding some of the diff rows. For example, in 1995-1996.diff, rows 159-178: -421000 荆沙市>.,荆门市 -421002 沙市区>. -421003 荆州区>. -421004 江陵区>. -421022 公安县>. -421023 监利县>. -421025 京山县># -421081 石首市>. -421082 钟祥市># -421083 洪湖市>. -421087 松滋市>. +421000 荆州市<. +421002 沙市区<. +421003 荆州区<. +421004 江陵区<. +421022 公安县<. +421023 监利县<. +421081 石首市<. +421083 洪湖市<. +421087 松滋市<.

It is not entirely clear to me how these diffs (for example, -421022 公安县>. vs. +421022 公安县<.) are generated, since they do not appear to reflect any tangible difference based on the corresponding source data. Would you happen to have any insights or hypotheses about what might be driving these differences?

yangtze-shi avatar Jun 06 '25 03:06 yangtze-shi

These redundant diffs were inserted in the commit 69e125e589c40224defe368d82ffc0325da4b01a. I meant them for some reason, but they don't seem really necessary in hindsight. They do reflect a change in their parent entry's name, which, for example, is included in the CSV output:

421022,湖北省,荆州市,公安县,县级,启用,1996,,
421022,湖北省,荆沙市,公安县,县级,变更,1994,1996,421022

But it's just as fine without these diffs as they can be inferred. Have they become a hindrance in your use case?

yescallop avatar Jun 06 '25 05:06 yescallop

I remember some corner cases like 410611 being 焦作市郊区 in 1981 but then 鹤壁市郊区 in 1982 (see #4), as a result of rearrangement of the codes during that period. This means that in 民政部's original data, two identical codes with identical names do not necessarily correspond to the same division, but you have to check if their parents' names are identical as well. If their parents' names are not identical, then they may correspond to the same division (e.g., 421022 being 荆州市公安县 and 荆沙市公安县) or may not (e.g., 410611 being 焦作市郊区 and 鹤壁市郊区). It now seems to me a consistent move to include the diffs for both cases, but you might have a different point of view. What do you think?

yescallop avatar Jun 06 '25 16:06 yescallop

Hah! That totally makes sense. So the diff suggests that either: a) the name or codes of the corresponding area have changed, or b) they haven’t, but the parent area(s) have changed — which must involve a name change rather than a code change; otherwise, it would appear as a code change. Am I getting it right?

yangtze-shi avatar Jun 06 '25 16:06 yangtze-shi

Yes, you're right. Another way to look at it is to consider an area's full name (e.g., 湖北省荆州市公安县). A diff row suggests that either the full name or the code of the corresponding area has changed. I'll leave this as an open question for the moment.

yescallop avatar Jun 09 '25 11:06 yescallop

I see! Thank you for your detailed explanation. I think it makes sense to keep the diffs. Sorry for my earlier confusion

yangtze-shi avatar Jun 17 '25 23:06 yangtze-shi