MDMHPCoreData
MDMHPCoreData copied to clipboard
Duplicate insert
I am getting duplicate inserts when I have a data set like this:
{ "guid":"a"},
{ "guid":"c"},
Followed by an import on data set like this:
{"guid":"a"},
{"guid":"b"},
{"guid":"c"},
After the first data set is inserted (just a and c) we try to import a new json set with a,b,c:
try to fetch objects a,b,c from managedObjectContext
return objects a,c
compare a==a
update a
compare b==c
insert b
compare c==b
insert c
object c is then duplicated
Updated with actual ufo json:
First import this in ufo.json:
[
{"description": "Man repts. witnessing "flash, followed by a classic UFO, w/ a tailfin at back." Red color on top half of tailfin. Became triangular.", "reported_at": "19951009", "shape": "", "location": "Iowa City, IA", "duration": "", "sighted_at": "19951009", "guid": "a"},
{"description": "Telephoned Report:CA woman visiting daughter witness discs and triangular ships over Squaxin Island in Puget Sound. Dramatic. Written report, with illustrations, submitted to NUFORC.", "reported_at": "19950103", "shape": "", "location": "Shelton, WA", "duration": "", "sighted_at": "19950101", "guid": "c"},
]
Stop the app then update ufo.json to this and import:
[
{"description": "Man repts. witnessing "flash, followed by a classic UFO, w/ a tailfin at back." Red color on top half of tailfin. Became triangular.", "reported_at": "19951009", "shape": "", "location": "Iowa City, IA", "duration": "", "sighted_at": "19951009", "guid": "a"},
{"description": "Man on Hwy 43 SW of Milwaukee sees large, bright blue light streak by his car, descend, turn, cross road ahead, strobe. Bizarre!", "reported_at": "19951011", "shape": "", "location": "Milwaukee, WI", "duration": "2 min.", "sighted_at": "19951010", "guid": "b"},
{"description": "Telephoned Report:CA woman visiting daughter witness discs and triangular ships over Squaxin Island in Puget Sound. Dramatic. Written report, with illustrations, submitted to NUFORC.", "reported_at": "19950103", "shape": "", "location": "Shelton, WA", "duration": "", "sighted_at": "19950101", "guid": "c"},
]
@HarrisonJackson I don't understand the problem. Are you saying the import code is broke as it is allowing duplicates? Pull requests are welcome.
Yes, that's what I am saying. It seems as if the intention of the find or create method is to find an existing record, so that a duplicate is not inserted, but there are cases the way it is written where a duplicate could be created.
I was actually reporting the issue because I hoped you had an idea of how to fix it haha. The fastest and definitely dirtiest fix is to change the MDM_BATCH_SIZE_IMPORT
from 5000
to 1
. Obviously that wastes many of the other optimizations. After fiddling with it a bit, I found it takes an import that took ~2 seconds to about ~5 seconds. Not ideal, but in my case it was better than allowing duplicates. If I find a cleaner way I will submit a PR.
The project and associated write up are excellent - thanks for putting them out there.