Meteor-CollectionFS icon indicating copy to clipboard operation
Meteor-CollectionFS copied to clipboard

Get file's binary data from GridFS

Open lukejagodzinski opened this issue 11 years ago • 32 comments

Hi, how to get binary data of given file from GridFS? I need to send it to another server using HTTP.post() request.

lukejagodzinski avatar Oct 21 '14 11:10 lukejagodzinski

I have a similar problem that I've been breaking my brain about, and that is getting the dataUri from a file stored using the FileSystem store. I need to read the file from the FileSystem, encrypt it (with something that doesn't support streams) and then write it to S3 using cfs-s3 store.

Any help would be appreciated.

icellan avatar Oct 21 '14 13:10 icellan

FS.Collection uses internally cfs:dataman to deal with reading and transforming one data format into another but I don't know if it's good track. Moreover there are createReadStream and createWriteStream functions. I was looking for some kind of object that can attach to ReadStream and read binary data out of it but without success. Problem still unresolved.

lukejagodzinski avatar Oct 21 '14 13:10 lukejagodzinski

You might want to look through the cfs:file tests which do some binary manipulation. Also, fileObj.createReadStream(storeName) gets you a stream from the store (see API), and you can use the data-man package to convert that read stream to some other binary format. There are also quite a few node packages on npm that deal with stream/buffer/binary conversion.

If you figure out anything, please post some code for the benefit of others.

aldeed avatar Oct 21 '14 14:10 aldeed

@aldeed but how to convert ReadStream to Buffer using data-man package? In cfs:file-tests is no code doing that. You only get data from temp file which is not the case when you are using GridFS.

lukejagodzinski avatar Oct 21 '14 15:10 lukejagodzinski

I could only get Buffer using that code:

var dataMan = new DataMan('http://localhost:3000' + thumbnail.url(), thumbnail.type());
dataMan.getBuffer(function (err, buffer) {
  console.log(buffer);
});

(Need to install cfs:data-man before using that code)

But it's one of the worst solutions I could think of:

  • It makes HTTP request even though I have direct access to GridFS or local FileSystem,
  • It's asynchronous and useless in Meteor methods (of course I could wrap it using Meteor.wrapAsync but that's not the point),
  • HTTP request needs passing full URL (not URI) to the file. So I have to make it configurable depending on what server I'm running on (development/production).

lukejagodzinski avatar Oct 21 '14 15:10 lukejagodzinski

@jagi, see the dataman api here. Callbacks are optional on the server. Take a look at the data-man tests, too. More of the conversion tests are there.

I didn't test, but I think this will work:

var buffer = new DataMan(thumbnail.createReadStream(storeName), thumbnail.type({store: storeName})).getBuffer();

aldeed avatar Oct 21 '14 16:10 aldeed

@aldeed when I'm using GridFS and above code I get error Error: DataMan constructor received data that it doesn't support. When I use FileSystem I doesn't get any error but nothing happens. Any idea how to make it work?

And right new DataMan().getBuffer() work synchronously. Thanks :).

And I have one more question. Does HTTP module in FS support sending files over HTTP like standard Node.js request package with FormData?

lukejagodzinski avatar Oct 21 '14 16:10 lukejagodzinski

Just a note: nope we havent added formdata - but its actually not too hard to add, so it could be added at some point.

raix avatar Oct 21 '14 17:10 raix

Ok nice, now I'm using my custom implementation of it to HTTP request synchronous, but it's not perfect.

lukejagodzinski avatar Oct 21 '14 17:10 lukejagodzinski

DataMan apparently doesn't detect ReadStream as a Stream.Readable so I've tried make little workaround and execute DataMan code manually:

var readStream = thumbnail.createReadStream();
var dataMan = new DataMan.ReadStream(readStream, thumbnail.type());
var buffer = Meteor._wrapAsync(Function.prototype.bind(dataMan.getBuffer, dataMan))();

It doesn't show any error but it stucks in the last line. I was waiting few minutes without any effect. The same situation happened with FileSystem storage.

lukejagodzinski avatar Oct 21 '14 17:10 lukejagodzinski

I think it's not event implemented: https://github.com/CollectionFS/Meteor-data-man/blob/master/server/data-man-readstream.js#L24

Am I right?

lukejagodzinski avatar Oct 21 '14 17:10 lukejagodzinski

Got it working with basic node.js streams on the server:

var doc = FS.File(collection.findOne({_id: id}));
var readStream = doc.createReadStream();
var buffer = new Buffer(0);
readStream.on('readable', function() {
    buffer = Buffer.concat([buffer, readStream.read()]);
});
readStream.on('end', function() {
    console.log(buffer.toString('base64'));
});

It would be much better to have this working in the api directly.

icellan avatar Oct 21 '14 19:10 icellan

A better implementation that can be reused and called synchronously on the server:

            var getBase64Data = function(doc, callback) {
                var buffer = new Buffer(0);
                // callback has the form function (err, res) {}
                var readStream = doc.createReadStream();
                readStream.on('readable', function() {
                    buffer = Buffer.concat([buffer, readStream.read()]);
                });
                readStream.on('error', function(err) {
                    callback(err, null);
                });
                readStream.on('end', function() {
                    // done
                    callback(null, buffer.toString('base64'));
                });
            };
            var getBase64DataSync = Meteor.wrapAsync(getBase64Data);

icellan avatar Oct 21 '14 21:10 icellan

Oh geez, yeah I guess we never implemented that piece. If you guys figure it out and want to do a pull request, feel free. Something like @icellan's last post is probably pretty close to what's needed.

aldeed avatar Oct 21 '14 22:10 aldeed

@icellan It doesn't work for me. First I've tried using your approach with wrapAsync but, that returned error. Next, I've tried asynchronous version to just check if it works at all. And readable event doesn't occur at all and buffer in the end event is empty. Do you have any idea why is that? Are you using your function in the Meteor method?

lukejagodzinski avatar Oct 28 '14 09:10 lukejagodzinski

Ok it works but I had to use data event instead of readable.

lukejagodzinski avatar Oct 28 '14 10:10 lukejagodzinski

@jagi, that's right it should be the data event. Just a typo in @icellan's post.

aldeed avatar Oct 28 '14 13:10 aldeed

@jagi @aldeed Sorry for not getting back earlier, but I'm still using "readable" in my app and this is working with no problems. The initial code came from the node.js stream handbook, where they say not to use "data":

Note that whenever you register a "data" listener, you put the stream into compatability mode so you lose the benefits of the new streams2 api.

You should pretty much never register "data" and "end" handlers yourself anymore. If you need to interact with legacy streams, use libraries that you can .pipe() to instead where possible.

https://github.com/substack/stream-handbook

icellan avatar Nov 23 '14 16:11 icellan

@icellan @jagi Hi, i'm trying your code in a Tinytest but nothing happen. It stays blocked as if the stream was not readable. I'm using [email protected] and [email protected] on a Meteor 1.1.0.2 platform.

// creation of collections
var imageStore = new FS.Store.GridFS(prefix + "-myfiles", {});
MyFilesCol = new FS.Collection(prefix + "-myfiles", {
  stores: [imageStore]
});

// your solution
getBase64Data = function(docId, callback) {
    var doc = new FS.File(WatchedItemsImages.findOne({_id: docId}));
    var readStream = doc.createReadStream();
    var buffer = new Buffer(0);
    readStream.on('readable', function() {
        console.log('readable');
        buffer = Buffer.concat([buffer, readStream.read()]);
    });
    readStream.on('end', function() {
        console.log(buffer.toString('base64'));
        callback(null, buffer.toString('base64'));
    });
};
getBase64DataSync = Meteor.wrapAsync(getBase64Data);

Tinytest.add('try to download an image and save it in db using CFS:GRIDFS', function(test) {
  // clean collections
  _.each([MyFilesCol], function(col) {
    col.remove({});
  });

  // get an image from a url and store it manually using dataUri
  var response = HTTP.get('https://www.hyundaicanada.com/content/Assets/2016/360/ELANTRA/Exterior/Shimmering_Silver/en/01.jpg', {
    "npmRequestOptions": {
      "encoding": null
    }
 }),
getDataUri = function(response) {
  var contentType = response.headers && response.headers['content-type'] ? response.headers && response.headers['content-type'] : 'image/jpeg',
    base64Content = response.content.toString('base64');
  return "data:" + contentType + ";base64," + base64Content;
};

var insertedImage = WatchedItemsImages.insert(getDataUri(response));

// trying to retrieve the image binary data
var data = getBase64DataSync(insertedImage._id);

test.equal(typeof data, undefined);
});

Rebolon avatar Aug 04 '15 16:08 Rebolon

Ok, so after reading a large part of the source code, it seems that the best way to write a file is :

var docCol = MyCollection.findOne(),
      doc = new FS.File(docCol),
      readable = doc.createReadStream(prefix + "-myfiles"),
      writeable = Npm.require('fs').createWriteStream('/tmp/stream.jpg'),
      buffer = [];

readable.on('data', function(buf) {
  buffer.push(buf);
});

readable.on('end', function() {
  console.log('readable end', buffer.concat().toString('base64'));
});

readable.pipe(writeable);

really simple in fact, except that's a sherlock holmes work

Rebolon avatar Aug 04 '15 20:08 Rebolon

Thanks @Rebolon for your code! On my side, it looks like I had to pass buffer.concat()[0].toString('base64') to get base64 data (meaning, to get the first element of the concatenated array). Here's what I used in the end:

// Helper function to retrieve the binary content of a CFS file with base64 encoding
var getBase64Data = function(file, callback) {
  // callback has the form function (err, res) {}
  var readStream = file.createReadStream();
  var buffer = [];
  readStream.on('data', function(chunk) {
    buffer.push(chunk);
  });
  readStream.on('error', function(err) {
    callback(err, null);
  });
  readStream.on('end', function() {
    callback(null, buffer.concat()[0].toString('base64'));
  });
};

getBase64DataSync = Meteor.wrapAsync(getBase64Data);

a-becker42 avatar Sep 11 '15 17:09 a-becker42

Thanks a lot guys.

Finally it works for my project with the code below:

var getBase64Data = function(doc, callback) {
    var buffer = new Buffer(0);
    // callback has the form function (err, res) {}
    var readStream = doc.createReadStream();
    readStream.on('data', function(chunk) {
        buffer = Buffer.concat([buffer, chunk]);
    });
    readStream.on('error', function(err) {
        callback(err, null);
    });
    readStream.on('end', function() {
        // done
        callback(null, buffer.toString('base64'));
    });
};
var getBase64DataSync = Meteor.wrapAsync(getBase64Data);

dappl avatar Sep 25 '15 00:09 dappl

Hi Has there been any solution for this. I have tried all approaches stipulated above, and my webapp always crushes with STDERR. I can't seem to get the stream/binary out of the fs.collection.

Noveltysa avatar Oct 22 '15 11:10 Noveltysa

I've litterally copy-pasted the snippet of code I use in my repo, and it works. Not sure what's going wrong on your side.

a-becker42 avatar Oct 22 '15 15:10 a-becker42

Thanks everyone for the discussion. And @a-becker42, your snippet worked perfectly for me as well! Great! Probably some of the other proposals could work as well, but might be useful for others to know that this one worked for me too.

derouck avatar Oct 28 '15 15:10 derouck

Thanks Guys... the code works fine. initially struggled to get my stream out of fs.collection, but now all is fine, thanks once again for sharing.

Noveltysa avatar Oct 28 '15 17:10 Noveltysa

Glad it helped. Again, as mentioned by @aldeed, this should really be implemented in the package itself.

a-becker42 avatar Nov 03 '15 16:11 a-becker42

+1

weeger avatar Nov 16 '15 21:11 weeger

@a-becker42 After some more testing I discovered that the piece of code you've posted here is only working for files which are smaller than 64kb. At least in my case it will just work for the first chunk. Maybe the chunk size could be tweaked.

The reason for this is that you're using the concat() function in a wrong way (). As mentioned in the documentation concat will concatenate an array with the arrays you pass as argument. In some of the others code snippets concat is used on a buffer which is different. In this case this call does nothing and afterwards you are just taking the first chunk from the original array.

@Rebolon Your second post seems also vulnerable to this issue.

This means that for me only the solution of @dappl is working. I think that if someone is using the other piece it is highly recommended to switch because sooner or later you might encounter a bigger file which will get you puzzled.

It might be better to postpone the concatenation to the end of the process. But could not get it working immediately by just replacing the concat() with a join('').

Anyway, thanks again everyone for this thread, it has been really valuable for me.

derouck avatar Nov 18 '15 18:11 derouck

Thanks @icellan :+1:

talha-asad avatar Feb 11 '16 04:02 talha-asad