node-ebml icon indicating copy to clipboard operation
node-ebml copied to clipboard

Problems with big chunks

Open themasch opened this issue 6 years ago • 8 comments

To encode the data size Matroska uses a flexible length encoding. These are basically unsigned ints with a length from 1 to 8 bytes or from 7 to 56 bits. See the spec for more information: https://www.matroska.org/technical/specs/index.html#EBML_ex

These seem to have two issues in node-ebml currently:

  • [ ] Integers with more than eight bytes, or bigger than 2^56-2. These shouldn't exist in ebml, following the spec.
  • [ ] Integers bigger than 2^53-1 but smaller than 2^56: Supported by EBML but not by JavaScript

The first group should not be our main concern, I have not seen that yet and we may be able to just throw an error if that happens. We should check for the "all 1"-special case thou.

The second group is going to be a problem. We might need something like bigint or bignum for that, but we can not use bignum as a buffer size etc. I guess we need some real-world samples to figure out what could be done here.

Issues related to this are #8 and #10.

themasch avatar Jul 10 '18 15:07 themasch

Thinking about it again. 2^53 bits are about 1 Petabyte. I guess we can assume that there just are no chunks with a defined size of that size?

themasch avatar Jul 26 '18 14:07 themasch

How bad are the performance issues when switching to some big number library? If this library won't support more than 2^53, it seems reasonable, but should be well documented I think.

bradisbell avatar Jul 26 '18 15:07 bradisbell

I don't think we really can use bignum here because we are using the read ints as buffer sizes and I highly doubt that you can have buffers bigger than Number.MAX_SAFE_INTEGER bytes.

If it were just for putting numbers in the exported structs I would try it, and maybe we should do it for numbers that are not used as buffer sizes but afaik we cannot use BigNum (or something like that) as a buffer size.

themasch avatar Jul 27 '18 08:07 themasch

Maybe we should use things like BigNum here: https://github.com/themasch/node-ebml/blob/master/lib/ebml/tools.js#L197

themasch avatar Jul 31 '18 11:07 themasch

Okay, I don't know why I didn't think of this before, but I fired up Node (10.9.0). @themasch, you're correct in that

I highly doubt that you can have buffers bigger than Number.MAX_SAFE_INTEGER bytes

node buffer max alloc

There are projects dedicated to this exact problem, but the issue is adding more dependencies.

EDIT: OK, when compiling directly, it seems to be of size_t (the C++ type) length.

jayands avatar Aug 26 '18 01:08 jayands

One could try to map this to multiple buffers but that changes the API again for a use case that may never happen. Again, that would be a segment of around 1 PB. And that's one block. The usual block size my test files have for a FullHD 2Hour h264 MKV is between a few hundred bytes to some kb or maybe one MB. There are 201294 of these block in this two-hour piece (total 5.4GB).

The maximal size of a Buffer in node seems to be around 2.14GB:

RangeError: File size is greater than possible Buffer: 0x7fffffff bytes
    at FSReqWrap.readFileAfterStat [as oncomplete] (fs.js:453:11)

themasch avatar Aug 27 '18 11:08 themasch

Hey, get this! There was a BigInt added to Node 10.7! See this article for the post announcing it, as well as the TC39 proposal for it being at Stage 3.

This is huge because it means that tools like Babel will have to provide polyfills for them natively if/when it reaches Stage 4.

Edit: I just fired up the Chrome console because I was curious (and I'm on my Windows side), and the following happened:

const arr = BigInt64Array.from([0x11, 0x22]); // Uncaught TypeError: Cannot convert 17 to a BigInt at Function.from (<anonymous>)
let arr2;  arr = BigInt64Array.from([0x11n, 0x22n]) // BigInt64Array(2) [17n, 34n]

Apparently it's that easy in the spec! I mean, yeah, the Hex followed by an n looks a little silly, but seriously, this could possibly be done in two or so days.

The only problem I foresee is that BigInt and family still need optimization in implementation so #31 would be taking an even further hit. But still, it's exciting!

jayands avatar Aug 29 '18 23:08 jayands

Hello,

Do we have any update on this? I'm building a torrent live streaming video player and I run into issue #8 so I'd really love any news 😄

Kylart avatar Jun 01 '19 08:06 Kylart