scrapy-djangoitem
scrapy-djangoitem copied to clipboard
Should be possible update existing model instance
I'm scraping a listing of events and storing them in an Event model. This model inherits from model_utils.models.TimeStampedModel which gives auto-populated created and updated fields, which I find useful.
Unfortunately, there's no way for me to reuse an existing, populated instance of Event in my pipeline without manually copying each field in the DjangoItem to my model, so created is being updated every time.
There should be a way to do this, perhaps by allowing setting of .instance to my model instance then moving attribute setting into .save().?
FWIW, My pipeline looks like this:
class EventDjangoPipeline(object):
def process_item(self, item, spider):
try:
event = models.Event.objects.get(url=item["url"])
event_id = event.id
event = item.save(commit=False)
event.id = event_id
except models.Event.DoesNotExist:
pass
item.save()
return item
Ideally, I'd like to be able to do something like:
class EventDjangoPipeline(object):
def process_item(self, item, spider):
try:
event = models.Event.objects.get(url=item["url"])
item.instance = event
except models.Event.DoesNotExist:
pass
item.save()
return item
Workaround is to set created in the pipeline:
class EventDjangoPipeline(object):
def process_item(self, item, spider):
try:
event = models.Event.objects.get(url=item["url"])
item["created"] = event.created
event_id = event.id
event = item.save(commit=False)
event.id = event_id
except models.Event.DoesNotExist:
pass
item.save()
return item