rar-stream icon indicating copy to clipboard operation
rar-stream copied to clipboard

Filtering, Max Files and Supporting URLs

Open jaruba opened this issue 1 year ago • 8 comments

Hi! We've been playing with your module for the last few days and were hoping to include it in Stremio's local streaming server.

I wanted to share our experience with it. 😄

Specifically this code gave us a bit of trouble: https://github.com/doom-fish/rar-stream/blob/caf795311b47a4cc866895ca06088a9d25a7ea05/src/rar-files-package.ts#L55-L75

This is used to traverse the archive in order to read the file headers and prepare the file chunks.

We stress tested it with some cases, one being a 175gb archive with many files included, which would take many tens of seconds until it finished parsing the file list due to the code linked above.

We thus added some options:

{
  "fileIdx": 1,
  "fileMustInclude": ["hello world", "/dexter/i"]
}

These were added so we can stop parsing when we find the file that we require, and also to skip adding the file chunks for all files we do not require. (you can see the changes here: https://github.com/Stremio/rar-stream/blob/ee358ffe86b7060e554aa326a14d15e6f0cb739d/src/rar-files-package.ts#L55-L95 )

As far as we can tell the traversal to get the file list needs to be sequential and there is no shortcut to this method.

We were thinking that one way that we could continue using this module (and not a fork of it) would be to possibly simplify these options so they could be useful for more users, such as:

{
  "maxFiles": 1,
  "filter": (filename, fileIdx) => {
    return filename.includes('hello world')
  }
}

Otherwise, you did absolutely amazing work! I kept stumbling over your module through the years and kept disregarding it because I couldn't wrap my mind around what missing decompression means for RAR archives, and it randomly came to me recently that we don't need decompression at all for our usecase.

Our experiments lead to making this module: https://github.com/stremio/rar-http

It adds support for URLs and uses an HTTP API. I believe that support for URLs could also be added directly to your module, but if we want to keep the spirit of a self-contained (no dependencies) module, it may be hassle to use the http / https modules directly for this task.

We're mostly curious about your opinion on all of this, and if you would even accept PRs for such changes.

Thank you for building rar-stream!

jaruba avatar Feb 13 '24 18:02 jaruba

I also found the readme of unzip-stream interesting: https://www.npmjs.com/package/unzip-stream#parse-zip-file-contents

As it also has a method of giving the user of the module control over the traversal of the archive's file list.

jaruba avatar Feb 14 '24 12:02 jaruba

Hello 👋 so nice to hear that you find use of this library!

Hehe I had the same discovery that most media related cases uses rar just for file splitting. As a retired warez scener I know the reasons for splitting files like this but it makes less sense from the outside. Anyhow, nice to be able to stream things directly without having to have all the content locally.

I am a bit busy at the moment, but your suggestions are reasonable and sound. I will assess things more closely tomorrow.

Happy to receive PRs.

Cheers and thanks for the kind words!

1313 avatar Feb 14 '24 20:02 1313

When I think of it I think we can optimize it further and calculate the chunks with some arithmetics. As we know the file size and the intermediate chunks should have a constant size for file heads.

I'll do some digging. Filtering is not a bad idea either, especially with a lot of smaller files.

1313 avatar Feb 19 '24 07:02 1313

@1313 sorry for the late reply, i have been hunting for a new apartment lately

When I think of it I think we can optimize it further and calculate the chunks with some arithmetics.

That would be great, I have many test cases if you need anything tested, and we are also willing to assist with development for improving the module. Should we wait to see what results your current attempts lead to? Or do you have any details of how you think some features should be best implemented?

jaruba avatar Feb 21 '24 17:02 jaruba

Haha when revisiting the code I see that the optimisation is already there.

Sorry for a late reply here as well. Short on free time with two toddlers to run after. I hope the apartment hunt bares fruit.

Anyhow, I think a filtering solution would be great, if you want to give it a shot I'm happy to review a PR and we can take the discussion from there.

Cheers

1313 avatar Mar 02 '24 07:03 1313

@1313 tell me what you think: https://github.com/doom-fish/rar-stream/pull/30

jaruba avatar Mar 04 '24 14:03 jaruba

although off-topic, another interesting case i ran into is the way that the FileMedia interface works: https://github.com/doom-fish/rar-stream?tab=readme-ov-file#innerfile-api

it expects createReadStream() to be synchronous, which only creates an issue if I try to use node-fetch instead of needle, as the expected way of using node-fetch would be:

const createReadStream = async () => {
  const fetch = require('node-fetch');
  const resp = await fetch(url, opts);
  return resp.body; // this is a stream
}

so node-fetch can only be used in an async function, using await file.createReadStream() should work for both sync and async cases.

this is outside of our need, i just thought that using node-fetch would have been interesting because it can easily be swapped with the browser's own HTML5 fetch, and there is a good chance that rar-stream could be browserified to unpack and play a video straight in the browser when combined with something like: https://www.npmjs.com/package/videostream

just something to think about 😄

there is now a PR for this change here: https://github.com/doom-fish/rar-stream/pull/29

jaruba avatar Mar 04 '24 15:03 jaruba

I'll look into making the lib browser ready as well. I suspect that the binary dependency used for reading the headers requires node.

1313 avatar Mar 05 '24 07:03 1313