django-auditlog icon indicating copy to clipboard operation
django-auditlog copied to clipboard

v1.0a1 serializes additional_data log entry field as string

Open l0b0 opened this issue 3 years ago • 9 comments

After upgrading Django from 2.2 to 3.1.1 and django-auditlog from 0.4.7 to v1.0a1 I now have to json.loads(log_entry["additional_data"]) to get the same result as log_entry["additional_data"] before. Is this an intentional change?

Also, django 3.1 has a built-in JSONField which we'd probably want to use when available.

l0b0 avatar Sep 23 '20 04:09 l0b0

I created a PR https://github.com/jazzband/django-auditlog/pull/271 which replaces JSONField with TextField while retaining original functionality. It passes all tox tests, try it out.

Real-Gecko avatar Sep 28 '20 04:09 Real-Gecko

That would be very, very bad for the project I'm on. We already have many millions of audit log entries. Why not leave it as a JSONField and deserialize it as before?

l0b0 avatar Sep 28 '20 06:09 l0b0

I don't know what changed in Django 3.1, but it fails tests with JSONField, replacing it with TextField solves the problem.

We already have many millions of audit log entries.

This won't destroy existing data, there're two migrations: first alters field type, second renames it to additional_data_json. In case you're not using PostgreSQL there even won't be type conversion on the column: https://github.com/adamchainz/django-jsonfield/blob/a4a5f339df0da850a1087ae7b3cd20a19ef8c6b8/jsonfield/fields.py#L71-L81 And you can still get log_entry["additional_data"], so I think no changes will be required in code. However if you're tied to the field name of additional_data somewhere outside Django this will cause trouble.

Real-Gecko avatar Sep 28 '20 06:09 Real-Gecko

JSONField is built into Django 3, you should be able to use that directly now.

We already had to loads(log_entry["additional_data"] when upgrading to 1.0a1, are you saying that's meant to stay that way? Why change it? Is additional_data meant to be freeform text rather than JSON?

The comment about millions of records wasn't about losing them, but rather the migration taking possibly a very long time on a production system. I still don't quite understand why this shouldn't just stay a JSONField, simplifying the code, avoiding any JSON encoding/decoding incompatibilities, and saving time when upgrading.

l0b0 avatar Sep 28 '20 07:09 l0b0

We already had to loads(log_entry["additional_data"] when upgrading to 1.0a1

Nope, it's exactly what your issue about, log_entry.additional_data is Python dict.

This two methods ensure that it'll be valid:


    @property
    def additional_data(self):
        data = None
        try:
            data = json.loads(self.additional_data_json)
        except:
            pass
        return data

    @additional_data.setter
    def additional_data(self, var):
        self.additional_data_json = json.dumps(var)

shouldn't just stay a JSONField

changes field stores valid JSON to, but it's TextField, dunno why, so I decided to turn additional_data to TextField too.

rather the migration taking possibly a very long time on a production system

If field is only renamed it won't take much time :)

Real-Gecko avatar Sep 28 '20 07:09 Real-Gecko

I'm going to second not changing this. If anything, I'd rather see changes made in to a JSONField, though I would also be concerned about migrating that data.

Perhaps modifications could be made to support your use case without removing the power of a JSONField?

ndwhelan avatar Oct 13 '20 02:10 ndwhelan

I've left a few comments in #271 since I think it's a good compromise as well, in short: JSONField was always not really a JSONField but a glorified TextField with automatic JSON serialization/deserialization. Since django-jsonfield is not in maintenance mode and the author has recommended to move away from it, I think we get the same functionality by just treating the additional_data as the dumb text field it always has been and move the serialization/deserializaation into django-auditlog.

Some time in the future we can either migrate to django-jsonfield-backport (a backport of Django 3.1's proper JSONField) if we really see the need (e.g. smaller storage size, faster queries) which would imply a one-time data migration (which is always a risk). Or -- if enough time has passed and Django 3.1 is the community default for most projects -- just migrate to Django's JSONField in the same way.

In the meantime, I think #271 is a good step in between.

jezdez avatar Nov 25 '20 22:11 jezdez

What is the status on this and https://github.com/jazzband/django-auditlog/pull/271? I looked through the issues and this appears to me to be the only blocker for releasing a stable 1.0 version on pypi? While this issue is important, so is keeping up to date with the mainstream Django version and the latest stable release of django-auditlog supports neither 3.0 nor 3.1. It would be great to resolve this quickly and to get django-auditlog compatible with mainstream Django 3.1. I'd be happy to contribute a patch if you point me in the right direction and there is a chance for it to get merged.

IMO it's unavoidable to move away from django-jsonfield (as that one won't get 3.1 compatibility) and, as stated above, there are two options:

  • leave the additional_data format unchanged on db level, by switching from jsonfield.JSONField to models.TextField and pull (de-)serialization into django auditlog. Admittedly a slight degradation, but does the job.
  • migrate to django-jsonfield-backport and the new JSONField for 3.1+. This has the downside of requiring a migration on LogEntry, just for keeping the non-essential additional_data field compatible.

Either route is fine, but since additional_data is not a central feature of auditlog, it seems sensible to first change it to a TextField with manual json handling, just to kill the django-jsonfield dependency and to get a released django-auditlog that is compatible with both mainstream (3.1) and LTS Django (2.2).

In a later release, the LogEntry model can be migrated to benefit from the new models.JSONField on all relevant fields (changes, additional_data,..), along with dropping Django 2.2./3.0 support for that new release.

chrisittner avatar Dec 24 '20 13:12 chrisittner

@jezdez @chrisittner are you guys aware that for Postgres db django-jsonfield creates jsonb field ? https://github.com/adamchainz/django-jsonfield/blame/master/jsonfield/fields.py#L75 i.e. no matter if we change additional_data to TextField or to django JSONField, some databases will be affected and db migration will be needed anyway.

To me switching to JSONField looks like a natural choice going forward. Most databases and Django now support JSONField out of the box and it is handy for querying data inside of it. I understand the concern of having to do potentially lengthy migration, though. I am wondering if it would make sense to avoid it. For example, we could try to make a migration with 2 actions:

  • rename column additional_data => additional_data_old
  • create a new nullable JSON column named additional_data

additional_data_old would be only in migration, not in LogEntry model. I assume Django allows to craft such migration. This way no data migration will be performed, yet all old data would still be available, and people could decide by themselves if they want to copy data from additional_data_old to additional_data column, or just drop additional_data_old. Or they can just stick to using older version of django-auditlog. Seems like a good compromise to move the project forward. If you guys ok with the idea, I am happy to prepare a PR.

eprikazc avatar Nov 13 '21 05:11 eprikazc