django-auditlog
django-auditlog copied to clipboard
v1.0a1 serializes additional_data log entry field as string
After upgrading Django from 2.2 to 3.1.1 and django-auditlog from 0.4.7 to v1.0a1 I now have to json.loads(log_entry["additional_data"])
to get the same result as log_entry["additional_data"]
before. Is this an intentional change?
Also, django 3.1 has a built-in JSONField which we'd probably want to use when available.
I created a PR https://github.com/jazzband/django-auditlog/pull/271 which replaces JSONField
with TextField
while retaining original functionality. It passes all tox tests, try it out.
That would be very, very bad for the project I'm on. We already have many millions of audit log entries. Why not leave it as a JSONField
and deserialize it as before?
I don't know what changed in Django 3.1, but it fails tests with JSONField, replacing it with TextField solves the problem.
We already have many millions of audit log entries.
This won't destroy existing data, there're two migrations: first alters field type, second renames it to additional_data_json
. In case you're not using PostgreSQL there even won't be type conversion on the column:
https://github.com/adamchainz/django-jsonfield/blob/a4a5f339df0da850a1087ae7b3cd20a19ef8c6b8/jsonfield/fields.py#L71-L81
And you can still get log_entry["additional_data"]
, so I think no changes will be required in code.
However if you're tied to the field name of additional_data
somewhere outside Django this will cause trouble.
JSONField is built into Django 3, you should be able to use that directly now.
We already had to loads(log_entry["additional_data"]
when upgrading to 1.0a1, are you saying that's meant to stay that way? Why change it? Is additional_data
meant to be freeform text rather than JSON?
The comment about millions of records wasn't about losing them, but rather the migration taking possibly a very long time on a production system. I still don't quite understand why this shouldn't just stay a JSONField, simplifying the code, avoiding any JSON encoding/decoding incompatibilities, and saving time when upgrading.
We already had to
loads(log_entry["additional_data"]
when upgrading to 1.0a1
Nope, it's exactly what your issue about, log_entry.additional_data
is Python dict.
This two methods ensure that it'll be valid:
@property
def additional_data(self):
data = None
try:
data = json.loads(self.additional_data_json)
except:
pass
return data
@additional_data.setter
def additional_data(self, var):
self.additional_data_json = json.dumps(var)
shouldn't just stay a JSONField
changes
field stores valid JSON to, but it's TextField, dunno why, so I decided to turn additional_data
to TextField too.
rather the migration taking possibly a very long time on a production system
If field is only renamed it won't take much time :)
I'm going to second not changing this. If anything, I'd rather see changes
made in to a JSONField, though I would also be concerned about migrating that data.
Perhaps modifications could be made to support your use case without removing the power of a JSONField?
I've left a few comments in #271 since I think it's a good compromise as well, in short: JSONField was always not really a JSONField but a glorified TextField with automatic JSON serialization/deserialization. Since django-jsonfield is not in maintenance mode and the author has recommended to move away from it, I think we get the same functionality by just treating the additional_data
as the dumb text field it always has been and move the serialization/deserializaation into django-auditlog.
Some time in the future we can either migrate to django-jsonfield-backport (a backport of Django 3.1's proper JSONField) if we really see the need (e.g. smaller storage size, faster queries) which would imply a one-time data migration (which is always a risk). Or -- if enough time has passed and Django 3.1 is the community default for most projects -- just migrate to Django's JSONField in the same way.
In the meantime, I think #271 is a good step in between.
What is the status on this and https://github.com/jazzband/django-auditlog/pull/271? I looked through the issues and this appears to me to be the only blocker for releasing a stable 1.0 version on pypi? While this issue is important, so is keeping up to date with the mainstream Django version and the latest stable release of django-auditlog
supports neither 3.0 nor 3.1. It would be great to resolve this quickly and to get django-auditlog
compatible with mainstream Django 3.1. I'd be happy to contribute a patch if you point me in the right direction and there is a chance for it to get merged.
IMO it's unavoidable to move away from django-jsonfield
(as that one won't get 3.1 compatibility) and, as stated above, there are two options:
- leave the
additional_data
format unchanged on db level, by switching fromjsonfield.JSONField
tomodels.TextField
and pull (de-)serialization into django auditlog. Admittedly a slight degradation, but does the job. - migrate to django-jsonfield-backport and the new JSONField for 3.1+. This has the downside of requiring a migration on
LogEntry
, just for keeping the non-essentialadditional_data
field compatible.
Either route is fine, but since additional_data
is not a central feature of auditlog, it seems sensible to first change it to a TextField
with manual json handling, just to kill the django-jsonfield
dependency and to get a released django-auditlog
that is compatible with both mainstream (3.1) and LTS Django (2.2).
In a later release, the LogEntry
model can be migrated to benefit from the new models.JSONField
on all relevant fields (changes
, additional_data
,..), along with dropping Django 2.2./3.0 support for that new release.
@jezdez @chrisittner are you guys aware that for Postgres db django-jsonfield
creates jsonb
field ? https://github.com/adamchainz/django-jsonfield/blame/master/jsonfield/fields.py#L75
i.e. no matter if we change additional_data
to TextField
or to django JSONField
, some databases will be affected and db migration will be needed anyway.
To me switching to JSONField
looks like a natural choice going forward. Most databases and Django now support JSONField
out of the box and it is handy for querying data inside of it. I understand the concern of having to do potentially lengthy migration, though. I am wondering if it would make sense to avoid it. For example, we could try to make a migration with 2 actions:
- rename column
additional_data
=>additional_data_old
- create a new nullable JSON column named
additional_data
additional_data_old
would be only in migration, not in LogEntry
model. I assume Django allows to craft such migration. This way no data migration will be performed, yet all old data would still be available, and people could decide by themselves if they want to copy data from additional_data_old
to additional_data
column, or just drop additional_data_old
. Or they can just stick to using older version of django-auditlog
. Seems like a good compromise to move the project forward. If you guys ok with the idea, I am happy to prepare a PR.