node-ebml
node-ebml copied to clipboard
Problems with big chunks
To encode the data size Matroska uses a flexible length encoding. These are basically unsigned ints with a length from 1 to 8 bytes or from 7 to 56 bits. See the spec for more information: https://www.matroska.org/technical/specs/index.html#EBML_ex
These seem to have two issues in node-ebml currently:
- [ ] Integers with more than eight bytes, or bigger than 2^56-2. These shouldn't exist in ebml, following the spec.
- [ ] Integers bigger than 2^53-1 but smaller than 2^56: Supported by EBML but not by JavaScript
The first group should not be our main concern, I have not seen that yet and we may be able to just throw an error if that happens. We should check for the "all 1"-special case thou.
The second group is going to be a problem. We might need something like bigint or bignum for that, but we can not use bignum as a buffer size etc. I guess we need some real-world samples to figure out what could be done here.
Issues related to this are #8 and #10.
Thinking about it again. 2^53 bits are about 1 Petabyte. I guess we can assume that there just are no chunks with a defined size of that size?
How bad are the performance issues when switching to some big number library? If this library won't support more than 2^53, it seems reasonable, but should be well documented I think.
I don't think we really can use bignum here because we are using the read ints as buffer sizes and I highly doubt that you can have buffers bigger than Number.MAX_SAFE_INTEGER bytes.
If it were just for putting numbers in the exported structs I would try it, and maybe we should do it for numbers that are not used as buffer sizes but afaik we cannot use BigNum (or something like that) as a buffer size.
Maybe we should use things like BigNum here: https://github.com/themasch/node-ebml/blob/master/lib/ebml/tools.js#L197
Okay, I don't know why I didn't think of this before, but I fired up Node (10.9.0). @themasch, you're correct in that
I highly doubt that you can have buffers bigger than
Number.MAX_SAFE_INTEGER
bytes
There are projects dedicated to this exact problem, but the issue is adding more dependencies.
EDIT: OK, when compiling directly, it seems to be of size_t
(the C++ type) length.
One could try to map this to multiple buffers but that changes the API again for a use case that may never happen. Again, that would be a segment of around 1 PB. And that's one block. The usual block size my test files have for a FullHD 2Hour h264 MKV is between a few hundred bytes to some kb or maybe one MB. There are 201294 of these block in this two-hour piece (total 5.4GB).
The maximal size of a Buffer in node seems to be around 2.14GB:
RangeError: File size is greater than possible Buffer: 0x7fffffff bytes
at FSReqWrap.readFileAfterStat [as oncomplete] (fs.js:453:11)
Hey, get this! There was a BigInt added to Node 10.7! See this article for the post announcing it, as well as the TC39 proposal for it being at Stage 3.
This is huge because it means that tools like Babel will have to provide polyfills for them natively if/when it reaches Stage 4.
Edit: I just fired up the Chrome console because I was curious (and I'm on my Windows side), and the following happened:
const arr = BigInt64Array.from([0x11, 0x22]); // Uncaught TypeError: Cannot convert 17 to a BigInt at Function.from (<anonymous>)
let arr2; arr = BigInt64Array.from([0x11n, 0x22n]) // BigInt64Array(2) [17n, 34n]
Apparently it's that easy in the spec! I mean, yeah, the Hex followed by an n
looks a little silly, but seriously, this could possibly be done in two or so days.
The only problem I foresee is that BigInt and family still need optimization in implementation so #31 would be taking an even further hit. But still, it's exciting!
Hello,
Do we have any update on this? I'm building a torrent live streaming video player and I run into issue #8 so I'd really love any news 😄