mongoose icon indicating copy to clipboard operation
mongoose copied to clipboard

What's the best way to keep track of the changes made to a document?

Open josegl opened this issue 3 years ago • 11 comments

Mongoose version: 6.2.2

Hi. What would be the best option to keep track of the changes that have been made to a document? I'm trying to keep an array of changes like this:

[
  {path: '/my/path/', old_value: 'previous_value_for this path'}
]

And I want to achieve this the most transparent way for our developers teams. This means that I want to keep changes no matter how a document is updated. So if myDocument is {prop: 2}, then we may update myDocument using any of this ways:

myDocument.prop = 1;
myDocument.set('prop', 1);

Or applying a JSON Patch:

const patch = [{op: 'replace', path: '/prop', value: 1}];
fastJsonPatch.applyPatch(myDocument, patch, false, true, true); //This updates the document instead of creating a new one

In any of the options the target is to have an array _changes that in any of the examples above would be like this after applying the changes:

[{path: '/prop', old_value: 1}]

In order to achieve this feature I've created a Mongoose plugin:

const util = require('util');
const jsonpatch = require('fast-json-patch')

const proxy_handler = {
  apply: function (target, this_arg, arglist){
    const path = '/' + arglist[0].split('.').filter(p => p).join('/');
    const old_value = this_arg.get(arglist[0]);
    const change = {path, old_value};
    if(this_arg._changes){
      if(!this_arg._changes.some(change => change.path === path && util.isDeepStrictEqual(old_value, change.old_value))){//This change does not exist yet. (the same change could already exist because markModified is recursive
        this_arg._changes.unshift(change); //we insert the changes at the beggining of the array because if we have to revert the changes it is not neccesary to revert the array.
      }
    }else{
      this_arg._changes = [change];
    }
    const newtarget = target.bind(this_arg);
    newtarget(...arglist);
  }
};

const changesTracker = schema => {
  schema.post('init', function(doc){
    const $setProxy = new Proxy(doc.$set, proxy_handler);
    const setProxy = new Proxy(doc.set, proxy_handler);
    const markModifiedProxy = new Proxy(doc.markModified, proxy_handler);
    doc.$set = $setProxy;
    doc.set = setProxy;
    doc.markModified = markModifiedProxy;
  });

  schema.pre('save', function(next){
    if(this.isNew){//we do the same that in the post-init middleware because when a model is created using mongoose the init middleware does not apply
      const $setProxy = new Proxy(this.$set, proxy_handler);
      const setProxy = new Proxy(this.set, proxy_handler);
      const markModifiedProxy = new Proxy(this.markModified, proxy_handler);
      this.$set = $setProxy;
      this.set = setProxy;
      this.markModified = markModifiedProxy;
      this._changes = [{op: 'replace', path: '', old_value: undefined}]
    }
    next();
  });

  schema.pre('remove', function(next){
    this._changes = [{op: 'replace', path: '', old_value: this}]
    next();
  });
}
module.exports = changesTracker;

And this is working fine for almost every change but when you are updating a single value in a deeply nested array. Example: We have this document:

{
  a: 1,
  b: {
    b1: [1,2]
  }
}

Then we apply this patch:

const  path = [{op: 'replace', path: '/b/b1/1', value: 3}];
fastjsonpatch.applyPatch(myDocument, patch, false, true, true);

And at this point I'm a little stuck and any insight would be highly appreciated. Thank you.

josegl avatar Apr 07 '22 15:04 josegl

What will you do, when you made so many changes, that you hit the 16 MB BSON-Document Limit? Does it have to be in the same document, or can you store the deltas in another document, or maybe duplicate the document, make changes, increase the version of the document and disable the old copy. Or or or.

For the path thing, maybe use npm package mpath and the mongo specific dot notation instead of the RFC 6902 notation?

Uzlopak avatar Apr 07 '22 15:04 Uzlopak

What will you do, when you made so many changes, that you hit the 16 MB BSON-Document Limit? Does it have to be in the same document, or can you store the deltas in another document, or maybe duplicate the document, make changes, increase the version of the document and disable the old copy. Or or or.

For the path thing, maybe use npm package mpath and the mongo specific dot notation instead of the RFC 6902 notation?

Hi @Uzlopak. Actually I've not thought about reaching the BSON-Document limit because I thought that the limit was a MongoDB thing, and it didn't matter meanwhile the document is living just at run-time, because the deltas array is not going to be stored in Mongo.

About using another document to store the deltas... I don't know how this could be achieved using another document if the reason of this plugin is to access the deltas array from other middlewares by using this._changes using the auxiliar mehods this.pathHasChanged(path) and this.getPreviousValue(path) that I've implemented but that I've not published here for clarity reasons.

I've not used the mongo specific notation because I'm triying to hit the best performance for our systems, and we work with RFC 6902 notation, so if the changed paths are already using that notation we do not have to iterate again the array to transform from one notation to another. That's the reason why I insert the changes at the beggining of the array and not at the end, because this way, if I need to revert the changes, I do not have to revert the array to apply the deltas.

Thank you very much.

josegl avatar Apr 08 '22 06:04 josegl

Ok understood.

What is the specific issue you face? You dont know how to patch the single value in the array?

Uzlopak avatar Apr 08 '22 08:04 Uzlopak

Sorry for taking so much time to answer here. I wanted to prepare good tests in order to share the best info.

We start from this schema:

const changesTracker = require('./index'); 

const notificationSchema = mongoose.Schema({
  notify_at: Date,
  notify_to: [{type: mongoose.Schema.Types.ObjectId}],
  done_by: {type: mongoose.Schema.Types.ObjectId},
});

const taskSchema = new mongoose.Schema({
  notification: notificationSchema,
  created_by: {type: mongoose.Schema.Types.ObjectId},
  description: String,
});

taskSchema.plugin(changesTracker);

Then we create a new task:

  await mongoose.connect(MONGO_URL);
  const taskModel = mongoose.model('Task', taskSchema);
  const task = taskModel({
    notification: {
      notify_at: new Date(),
      notify_to: new Array(3).fill(0).map(() => mongoose.Types.ObjectId()),
      done_by: mongoose.Types.ObjectId(),
    },
    created_by: mongoose.Types.ObjectId(),
    description: 'test',
  });
  await task.save();
  const saved_task = await mongoose.models.Task.findById(task._id);

In my case I have this saved task:

{
    "notification": {
        "notify_at": "2022-04-11T10:00:12.134Z",
        "notify_to": [
            "6253fc2ce643c3a09841873a",
            "6253fc2ce643c3a09841873b",
            "6253fc2ce643c3a09841873c"
        ],
        "done_by": "6253fc2ce643c3a09841873d",
        "_id": "6253fc2ce643c3a098418740"
    },
    "created_by": "6253fc2ce643c3a09841873e",
    "description": "test",
    "_id": "6253fc2ce643c3a09841873f",
    "__v": 0
}

Now I want to update the second item of the notify_to array. If I do it this way:

saved_task.set("notification.notify_to.1", mongoose.Types.ObjectId())

Then in the saved_task._changes array I have this:

[
    {
        "path": "/notification/notify_to/1",
        "old_value": "6253fd0ee643c3a098418744"
    },
    {
        "path": "/notification/notify_to/1",
        "old_value": "6253fc2ce643c3a09841873b"
    },
]

The desired status would be just:

[
    {
        "path": "/notification/notify_to/1",
        "old_value": "6253fc2ce643c3a09841873b"
    },
]

At least the actual previous value appears in the deltas array, but, if do same operation this way:

saved_task.set.notification.notify_to[1] = mongoose.Types.ObjectId();

In the deltas array I only have this changes

[
    {
        "path": "/notification/notify_to/1",
        "old_value": "6253fd0ee643c3a098418744"
    },
]

And the actual former value does not appear in the deltas.

However, for more simple paths like the description:

saved_task.set('description', 'new_description');

The deltas array is:

[
    {
        "path": "/description",
        "old_value": "test"
    }
]

And if I update the path this way:

saved_task.description = 'new_description';

The result is:

[
    {
        "path": "/description",
        "old_value": "test"
    }
]

For simple nested data this works as well:

saved_task.set('notification.notify_at', new Date());

The result is

[
    {
        "path": "/notification/notify_at",
        "old_value": "2022-04-11T10:00:12.134Z"
    },
]

And if I update the path this way:

saved_task.notification.notify_at = new Date();

The result is

[
    {
        "path": "/notification/notify_at",
        "old_value": "2022-04-11T10:00:12.134Z"
    },
]

So, I am wondering if when a nested array single item is updated like

saved_task.set.notification.notify_to[1] = mongoose.Types.ObjectId();

Mongoose internally calls another methods that I could proxy as well, in order to keep track of this changes. And ideally the most desired result would be just one change per path when a single change happens to that path.

Thank you very much.

josegl avatar Apr 11 '22 10:04 josegl

@josegl here's a simpler version of the change tracker plugin that you can try. The benefit of the below approach is that it doesn't rely on patching Mongoose internals. What do you think of the below approach?

const mongoose = require('mongoose');

const changesTracker = schema => {
  schema.add({ _changes: [{ _id: false, path: String, oldValue: 'Mixed' }] });

  schema.post('init', function(doc){
    this.$locals.initialState = new doc.constructor(doc);
  });

  schema.pre('save', function(next){
    if (this.isNew) { return next(); }
    for (const path of this.directModifiedPaths()) {
      this._changes.push({ path, oldValue: this.$locals.initialState.get(path) });
    }
    next();
  });

  schema.post('save', function() {
    this.$locals.initialState = new this.constructor(this);
  });
};  

const schema = mongoose.Schema({ name: String, fullName: { first: String, last: String } });
schema.plugin(changesTracker);

run().catch(err => console.log(err));

async function run() {
  await mongoose.connect('mongodb://localhost:27017/test');
  await mongoose.connection.dropDatabase();
  const Test = mongoose.model('Test', schema);

  await Test.create({
    name: 'Jean-Luc Picard',
    fullName: { first: 'Jean-Luc', last: 'Picard' }
  });

  const doc = await Test.findOne();

  doc.name = 'Test 2';
  await doc.save();
  // [ { path: 'name', oldValue: 'Jean-Luc Picard' } ]
  console.log(doc._changes);

  doc.fullName.first = 'Test';
  await doc.save();
  // [
  //   { path: 'name', oldValue: 'Jean-Luc Picard' },
  //   { path: 'fullName.first', oldValue: 'Jean-Luc' }
  // ]
  console.log(doc._changes);
}

vkarpov15 avatar Apr 29 '22 01:04 vkarpov15

@vkarpov15 this is indeed a very much simpler version. However I can see some problems:

  1. Adding _changes to the schema would make the changes to be stored in the database. This is not what we initially intended with this plugin. Our objective is to use those changes to execute some generic post('save') middlewares after the document is updated, which will depend on these changes, but storing the changes or not could be an option of the tracker plugin. Would storing the changes in a virtual be a good fit in the event we don't want to save them to the database?

  2. Wouldn't this.$locals.initialState = new doc.constructor(doc); duplicate the document inside the model? Before working to search a solution to store the changes, we did something similar:

model._prev_doc = this.toObject();

And then we calculated the differences. However we found that the cpu and memory consumption of this approach was very high. Cpu consumption was high because we had to clone the document, and the memory consumption was very high because at the end we have an old copy of the document inside the model, and for big documents this was a very noticeable memory consumption increase.

This is why we started to work in the direction of finding an optimized solution where we wouldn't need to duplicate the document to use less memory, and to avoid the process of cloning the document to use less cpu time.

josegl avatar Apr 29 '22 10:04 josegl

  1. A virtual is fine. You can also use the $locals property. Either works for data you don't want to store in the database.
  2. Yes, the new doc.constructor(doc) creates a deep copy of the original document. The reason why I did it this way was to make it easier to access dotted paths like fullName.first. However, you are right about memory usage and performance: I think your approach would be considerably faster.

We hope to change that in the future, we're hard at work on improving the performance of cloning and serializing documents with issues like #10541. But your approach works too, so feel free to keep using it and please let us know if you need anything else :+1:

vkarpov15 avatar Apr 29 '22 12:04 vkarpov15

This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar May 17 '22 00:05 github-actions[bot]

This issue was closed because it has been inactive for 19 days since being marked as stale.

github-actions[bot] avatar May 23 '22 00:05 github-actions[bot]

I have published a RC version of the plugin here: https://github.com/Walcu-Engineering/mongoose-track-changes

josegl avatar Jun 27 '22 07:06 josegl

This should be in the mongoose itself, because in fact it does not track all changes

AlexRMU avatar Oct 11 '22 09:10 AlexRMU