ForerunnerDB icon indicating copy to clipboard operation
ForerunnerDB copied to clipboard

collection.save() Callback Not Working in Electron

Open xhocquet opened this issue 9 years ago • 10 comments

Hey there! Love the package, I've been using it to create an electron application. Please look at the following code:

const ForerunnerDB = require("forerunnerdb");
const fdb = new ForerunnerDB();
const db = fdb.db("DB");
db.persist.dataDir('/data');
let songCollection = db.collection("songs");

function generateLibrary(path) {
  if (!fs.existsSync(path)) {
    console.log("Directory '", path ,"' does not exist.");
    return;
  }

  fs.readdir(path, function(err, results) {
    for (let item of results) {
      let curFilePath = pathTools.join(path, item);
      fs.lstat(curFilePath, function(err, stats) {
        if (stats.isDirectory()) {
          saveDB(generateLibrary(curFilePath)); // Save between directories
        } else if (pathTools.extname(curFilePath) === ".mp3") {
          console.log("Adding: ", curFilePath);
          songCollection.insert({path: curFilePath});
        }
      });
    }
  })
}

function saveDB(callback) {
  console.log("Saving");
  songCollection.save(function (err) {
    console.log("Ping")
    if (!err) {
      console.log("Saved song collection.");
      if (callback) { callback(); }
    } else {
      console.log("Saving failed, this shouldn't happen.");
      if (callback) { callback(); }
    }
  });
}

generateLibrary("D:/Music");

I am trying to scrape my folder and save all the paths of MP3s. As you can see, I'm using a callback to save before recursively searching folders. This seems like a good increment. However, nothing will save! I hit the Saving log point in saveDB(), but it never hits the Ping. Am I doing something wrong?

xhocquet avatar Feb 14 '16 15:02 xhocquet

Hmm, I don't suppose you are able to step into the code are you and follow execution? I've not played around with electron so I don't know what it is like for debug.

I would suggest taking this back to the basics and create a script that does nothing other than try to save a collection and load it back.

I would be very interested to know what is causing this as in pure Node.js this works without issue (we have unit test coverage of this type of persistent storage usage for Node.js).

Irrelon avatar Feb 15 '16 14:02 Irrelon

I will get some more details. I had managed to save when supplying only 1 artist folder and saving at the end. It's something to do with synchronous behavior, as I'm trying to save in between operations. Maybe the for (let item of results) { is continuing while the DB is trying to save?

xhocquet avatar Feb 15 '16 14:02 xhocquet

Hmm yeah. You definitely want to wait for a previous save call to complete before asking for another one.

Irrelon avatar Feb 15 '16 14:02 Irrelon

But why save constantly like that? Can't you save once you've got all the items in the collection?

Irrelon avatar Feb 15 '16 14:02 Irrelon

My thinking was that I would rather save periodically than all at once, since the load from saving 50K+ records could block up the application for a bit. As you can see, it is a separate function anyway and was going to be used as a final save in other scenarios. This is just the original populating of the list.

xhocquet avatar Feb 15 '16 14:02 xhocquet

Also, when I tried saving all at once, the application ended up blocking/crashing when it got to a large number of items.

xhocquet avatar Feb 15 '16 14:02 xhocquet

Ahh ok I see where you're coming from. The issue will be that it won't save only changed data, it will attempt to save all data every time you call save. The persistence module is quite naive at the moment, it just dumps everything into a JSON file when you call save.

2 things need to change for this to be better for you.

  1. We need to save changes rather than the entire data on a call to save()
  2. We need to detect large amounts of data, split them into chunks and save and append the chunks with a setTimeout so that they allow the main thread some breathing room every so often to stop any blocking.

Number 2 is relatively simple to implement. I'll try and get it into an update quickly for you.

As a simple fix for now, you could try chunking the data yourself by naming your collections / splitting the collections by file name something like all songs that start with A go in collection('songs_a') etc... that will cut down the amount of data you are saving per collection and significantly reduce thread blocking.

I am going to make sure that no. 2 is done as a priority fix for you so you don't have to stop development. Should be able to get that in tonight.

Irrelon avatar Feb 15 '16 14:02 Irrelon

Thanks so much! I will do some research into JSON diffing and see if I can give you a PR based off of my own application.

On Mon, Feb 15, 2016, 9:50 AM Rob Evans [email protected] wrote:

Ahh ok I see where you're coming from. The issue will be that it won't save only changed data, it will attempt to save all data every time you call save. The persistence module is quite naive at the moment, it just dumps everything into a JSON file when you call save.

2 things need to change for this to be better for you.

  1. We need to save changes rather than the entire data on a call to save()
  2. We need to detect large amounts of data, split them into chunks and save and append the chunks with a setTimeout so that they allow the main thread some breathing room every so often to stop any blocking.

Number 2 is relatively simple to implement. I'll try and get it into an update quickly for you.

As a simple fix for now, you could try chunking the data yourself by naming your collections / splitting the collections by file name something like all songs that start with A go in collection('songs_a') etc... that will cut down the amount of data you are saving per collection and significantly reduce thread blocking.

I am going to make sure that no. 2 is done as a priority fix for you so you don't have to stop development. Should be able to get that in tonight.

— Reply to this email directly or view it on GitHub https://github.com/Irrelon/ForerunnerDB/issues/78#issuecomment-184237318 .

xhocquet avatar Feb 15 '16 15:02 xhocquet

Cool :)

I think that the performance issue with saving 50k+ records won't be the actual writing data to disk part but more likely the part that turns the javascript objects stored in the _data property (array) of the collection into JSON. I've actually been working on some performance improvements there and managed to achieve a 5x speed increase on JSON.stringify compared to what we had before. It's in the edge branch at the moment though so not ready for prime-time :)

I think that performance enhancement along with chunking the data on save will produce a significantly better experience when saving. The DB will know what has changed between the last save and the current state so that can be used to speed up saving as well but it requires a lot more work, it's not a low-hanging fruit like chunking the data is.

Irrelon avatar Feb 15 '16 15:02 Irrelon

For instance, when you do a save we could store 1 record per line in the JSON file so that we could identify the exact data to CRUD from the file rather than overwriting the entire file with a new version just for a single change. That alone would be a massive difference in performance... like literally monumental speed increase.

Just an FYI, going forward though the idea was to implement a BSON-like file that would allow us to modify individual records and individual data within a record because we would know where every bit of data was stored on a binary level. This is on the roadmap but quite far away at present.

Irrelon avatar Feb 15 '16 16:02 Irrelon