ffmpeg.wasm icon indicating copy to clipboard operation
ffmpeg.wasm copied to clipboard

ffprobe.wasm

Open loretoparisi opened this issue 4 years ago • 17 comments

Is your feature request related to a problem? Please describe. Implement ffprobe wasm version.

Describe the solution you'd like ffprobe is the necessary companion of ffmpeg, needed to analyze media file before processing .

Describe alternatives you've considered In this simple case I'm using the command line ffprobe via execFile to probe the file

probe = function (fpath) {
      var self = this;
      return new Promise((resolve, reject) => {
        var loglevel = self.logger.isDebug() ? 'debug' : 'warning';
        const args = [
          '-v', 'quiet',
          '-loglevel', loglevel,
          '-print_format', 'json',
          '-show_format',
          '-show_streams',
          '-i', fpath
        ];
        const opts = {
          cwd: self._options.tempDir
        };
        const cb = (error, stdout) => {
          if (error)
            return reject(error);
          try {
            const outputObj = JSON.parse(stdout);
            return resolve(outputObj);
          } catch (ex) {
            self.logger.error("MediaHelper.probe failed %s", ex);
            return reject(ex);
          }
        };
        cp.execFile('ffprobe', args, opts, cb)
          .on('error', reject);
      });
    }//probe

or in this case to seek to position in the media file:

seek = function (fpath, seconds) {
      var self = this;
      return new Promise((resolve, reject) => {
        var loglevel = self.logger.isDebug() ? 'debug' : 'panic';
        const args = [
          '-hide_banner',
          '-loglevel', loglevel,
          '-show_frames',//Display information about each frame
          '-show_entries', 'frame=pkt_pos',// Display only information about byte position
          '-of', 'default=noprint_wrappers=1:nokey=1',//Don't want to print the key and the section header and footer
          '-read_intervals', seconds + '%+#1', //Read only 1 packet after seeking to position 01:23
          '-print_format', 'json',
          '-v', 'quiet',
          '-i', fpath
        ];
        const opts = {
          cwd: self._options.tempDir
        };
        const cb = (error, stdout) => {
          if (error)
            return reject(error);
          try {
            const outputObj = JSON.parse(stdout);
            return resolve(outputObj);
          } catch (ex) {
            self.logger.error("MediaHelper.probe failed %s", ex);
            return reject(ex);
          }
        };
        cp.execFile('ffprobe', args, opts, cb)
          .on('error', reject);
      });
    }//seek

Additional context Probe media files before processing; seek to media position;

loretoparisi avatar Nov 24 '20 09:11 loretoparisi

@loretoparisi have you tried using https://github.com/alfg/ffprobe-wasm ? It probably needs a little polishing, but it should get the job done.

alexcarol avatar Dec 15 '20 16:12 alexcarol

I'm currently working on a NodeJS project that needs ffprobe. Thank you @loretoparisi for the workaround.

jaefloo avatar Feb 07 '21 10:02 jaefloo

You don't really need ffprobe to get the info. As long as we can get ffmpeg -i video.mp4, be able to write the output log to a txt file, then parse to JSON, that would be nice. Unfortunately I haven't even been able to do that.

goatandsheep avatar May 25 '21 19:05 goatandsheep

@goatandsheep I'm not sure that ffmpeg -i can output all info as ffprobe, may be yes.

loretoparisi avatar May 25 '21 19:05 loretoparisi

I was looking to get the "file checksum" for Audible AAX files and while it doesn't show up with ffmpeg -i, it does work with ffmpeg -v info -i *.aax (or other verbosity values - quiet,panic,fatal,error,warning,info,verbose,debug,trace). So if what you're looking for isn't in the default output, I'd suggest dialing it up.

Is there a way to get ffmpeg -i output as a nice JSON? That would be nice.

captn3m0 avatar Aug 22 '21 15:08 captn3m0

@captn3m0 thanks I will have a look at -v info. For JSON, in ffprobe it's '-print_format', 'json', in ffmpeg I never tried id.

loretoparisi avatar Aug 24 '21 13:08 loretoparisi

-print_format is a ffprobe only option :(

captn3m0 avatar Aug 24 '21 15:08 captn3m0

Any plan on support that? It would be awesome to get format and stream metadata on browser. Something like that: ffprobe -hide_banner -loglevel fatal -show_error -show_format -show_streams -print_format json video.mp4 ffprobe is so much faster than ffmpeg because it don't try to read the entire file.

brunomsantiago avatar Jan 04 '22 20:01 brunomsantiago

Recently I had a use case where I needed to perform a duration check on media files on the browser before uploading to the server. I'll put my approach here while I built my POC as it is somewhat related.

Use case For context, my use case is as follows:

  • User uploads a video/audio file into a file input.
  • Browser takes file and somehow derives its duration.

ffprobe-wasm is not the full ffprobe program

  • My first idea is to use ffprobe-wasm, but I quickly discovered that the program that is being executed is not actually ffprobe, but ffprobe-wasm-wrapper.cpp which is essentially an attempt to rewrite ffprobe to be more emscripten friendly, but only contains a fraction of the utility that ffprobe offers. The application as-is was insufficient for my use case as I needed to verify audio files as well.
  • I decided against enhancing ffprobe-wasm-wrapper.cpp at the time because it would essentially mean manually porting ffprobe to wasm, something I lacked both the time and expertise to do. What I instead explored is to compile the entire ffprobe into wasm, something which I managed to successfully accomplish. My fork of the ffprobe-wasm repo can be found here: https://github.com/crazoter/ffprobe-wasm. The messy & uncleaned steps I took are as follows:
  1. First, I cloned https://github.com/alfg/ffprobe-wasm. If you're trying to replicate the steps, you should refer to my fork which includes the updated dockerfile.
  2. I noticed that the ffmpeg version was very old. I updated the dockerfile to resolve that and use the latest ffmpeg (now using git instead of the snapshotted tarball). I then built their wasm module via docker-compose run ffprobe-wasm make.
  3. I then jumped into the running docker container with an interactive bash. I navigated to the ffmpeg directory that was downloaded to the tmp file, and manually compiled ffprobe.o and fftools.cmdutils. If I remember correctly, I executed:
emmake make fftools/ffprobe.o fftools/cmdutils.o
emcc --bind \
	-O3 \
	-L/opt/ffmpeg/lib \
	-I/opt/ffmpeg/include/ \
	-s EXTRA_EXPORTED_RUNTIME_METHODS="[FS, cwrap, ccall, getValue, setValue, writeAsciiToMemory]" \
	-s INITIAL_MEMORY=268435456 \
	-lavcodec -lavformat -lavfilter -lavdevice -lswresample -lswscale -lavutil -lm -lx264 \
	-pthread \
	-lworkerfs.js \
	-o ffprobe.js \
        ffprobe.o cmdutils.o
  • You can also perform emmake make build to build everything.
  • I am not too familiar with the flags to be honest, so this may be sub-optimal.
  • The instructions above may not be completely correct as I did not refine the process and rerun it to verify it. As a precaution I added the resulting files into the repo in my-dist.
  1. To test the files, I adapted emscripten's https://github.com/emscripten-core/emscripten/blob/main/src/shell_minimal.html to use the compiled ffprobe_g and made some modifications to the generated JS code to call the main function directly (ffprobe_g.max.js is beautified from ffprobe_g. As to why there's ffprobe_g and ffprobe, I did not investigate the reason). To run the file locally, I used Servez.
  • However, as I was not proficient at wasm, there were a few problems with my prototype. I imagine someone with more experience will be able to resolve these issues:
    1. There is no way to "reset" the args passed into ffprobe. Once the args are passed into the application, passing a non-empty args array into subsequent calls to main (without refreshing the page) will "stack" the new args with the old args, causing issues. My workaround was to use the same file name & flags for all main calls, and not pass args in subsequent main calls, which worked for our use case. YMMV.
    1. The interface was not as clean as ffmpeg-wasm as the logs are async and there is no indicator to specify when the application has finished running.
    1. The generated code assumed the existence of SharedArrayBuffer even though it was not necessary to process most files with it. It is thus necessary to guard parts of the code using typeof SharedArrayBuffer !== "undefined" to prevent the code from failing if you intended to use ffprobe without having to change your https headers.
  • Still, for anyone interested in porting ffprobe to wasm, I think this is a step in the right direction and can be worth exploring. I am actually quite curious why the original authors of ffprobe-wasm didn't just compile the whole file.

What I ended up using

  • Due to the uncertain reliability of my (somewhat successful) ffprobe prototype, I decided to go with using ffmpeg-wasm instead.
  • Handling async issues with ffmpeg-wasm was easier even though the data was coming separately from the logger as you could await for the execution to be completed.
  • The concern however is that I'd have to read the entire file into memory. Using 1GB of memory to read a 1GB audio file on the browser is unacceptable for my use case, even if the memory is released immediately afterward. This is a problem independent of ffmpeg-wasm, but instead caused by how the emscripten file system is used. After all, we'd have to somehow bring the file into MEMFS before ffmpeg can even start processing it, and normally we just bring the whole file into MEMFS. What if we just bring in a slice of that?
  • So I decided to instead use the Blob.slice API to obtain the first 5MB (arbitrary number) of data from the file, and then pass that into the emscripten file system using the fetchFile API provided by ffmpeg. The idea is that the metadata would be at the start of the file, and then we'll have some excess data for ffmpeg to guess the format of the file if necessary.
// Toy example
const maxSliceLength = Math.min(1024*1024*5, oFiles[nFileId].size);
const slicedData = oFiles[nFileId].slice(0, maxSliceLength);
(async () => {
  ffmpeg.FS('writeFile', 'testfile', await fetchFile(slicedData));
  await ffmpeg.run('-i', 'testfile', '-hide_banner');
  ffmpeg.FS('unlink', 'testfile');
})();
  • This resolved the memory issue, but introduced a new problem; since ffmpeg is only seeing the first 5MB of the file, it has to guess the duration of some files using bitrate. This thus involved a bit more engineering to identify if the estimation is performed, and if so, perform the estimation ourselves using the actual file size:
    • One way is to estimate by bitrate. Personally this is a last resort because the difference in estimated & actual file size can be ridiculous.
    • Second (more reliable) way is to take the estimated duration from ffmpeg and multiply it by maxSlicedLength / file.size.
  • edit: ffmpeg & ffprobe will throw an error for some files if it can't read the whole file (e.g. mp4, 3gp). More specifically, the dreaded Invalid data found when processing input For these types of files, there are 2 options currently available:
    • Use https://github.com/buzz/mediainfo.js, as ffmpegwasm currently does not support stdin / pipes (https://github.com/ffmpegwasm/ffmpeg.wasm/issues/141)
    • I'm not a fan of passing the file by chunks into MediaInfo as it introduces complexity, so I decided to employ a much easier solution: audio tags which works well for mp4.
  • I settled for this solution as I didn't need an exact value for the duration (a malicious actor would be able to bypass a browser-based check anyway).

Hopefully this write-up will benefit someone looking for a similar solution, or someone hoping to port ffprobe to wasm.

crazoter avatar Apr 28 '22 01:04 crazoter

@crazoter What amazing post! Thank you so much. Got thrilled at each paragraph for in the end discover this amazing mediainfo.js, which apparently suits perfect for my needs. I am very happy now!

brunomsantiago avatar Apr 28 '22 18:04 brunomsantiago

@crazoter Nice writeup! Thanks for checking out ffprobe-wasm:

Still, for anyone interested in porting ffprobe to wasm, I think this is a step in the right direction and can be worth exploring. I am actually quite curious why the original authors of ffprobe-wasm didn't just compile the whole file.

I chose not to compile the FFprobe program, but instead to use libav directly to re-implement the functionality of FFprobe as an API via Wasm as an experiement, rather than the CLI through the browser. A different approach since you can interface with libavcodec and libavformat directly and provide minimal results. Though it's a bit more work to re-implement the full functionality of FFProbe, of course.

alfg avatar Apr 28 '22 19:04 alfg

Hi everyone! I created an npm package a few months back. Repo is here: https://github.com/tfoxy/ffprobe-wasm . It comes with TS definitions.

I needed to use ffprobe in browser and Node.js so that I could read metadata without being affected by file size, so I tried to package the code at https://github.com/alfg/ffprobe-wasm so that it could be used as a library. The output tries to mimic the command

ffprobe -hide_banner -loglevel fatal -show_format -show_streams -show_chapters -show_private_data -print_format json

I don't know much about Emscripten or libavcodec/libavformat, so there are some properties that are missing. But hopefully this can be enough for some people.

EDIT: @crazoter thanks for providing those alternatives. In one project I only need the duration, so that solution of using HTMLMediaElement.duration is great! Also didn't know about mediainfo.js. Only thing that is not clear to me is if it needs to read the whole file to extract some of the metadata.

tfoxy avatar May 26 '22 14:05 tfoxy

@tfoxy mediainfo.js does not need to read the entire file, but imo acts strangely in regards to how much it needs to read, see: https://github.com/buzz/mediainfo.js/issues/108 I see that your ffprobe-wasm project only supports FS, but not HTTP(s)? Are there any plans to support HTTP(s) too? (my interest is in retrieving chapter data of videos in nodejs, not the browser)

jaruba avatar Dec 10 '22 08:12 jaruba

How is the progress of this issue?

piesuke avatar Feb 22 '24 02:02 piesuke

Hi everyone! I created an npm package a few months back. Repo is here: https://github.com/tfoxy/ffprobe-wasm . It comes with TS definitions.

I needed to use ffprobe in browser and Node.js so that I could read metadata without being affected by file size, so I tried to package the code at https://github.com/alfg/ffprobe-wasm so that it could be used as a library. The output tries to mimic the command

ffprobe -hide_banner -loglevel fatal -show_format -show_streams -show_chapters -show_private_data -print_format json

I don't know much about Emscripten or libavcodec/libavformat, so there are some properties that are missing. But hopefully this can be enough for some people.

EDIT: @crazoter thanks for providing those alternatives. In one project I only need the duration, so that solution of using HTMLMediaElement.duration is great! Also didn't know about mediainfo.js. Only thing that is not clear to me is if it needs to read the whole file to extract some of the metadata.

That's amazing! I was able to build and run the docker container and the application. But I'm struggling to import the generated module ffprobe-wasm.js:

.
├── ffprobe-wasm.js
├── ffprobe-wasm.wasm
├── ffprobe-wasm.worker.js

within NodeJS, In fact if I try to load the module as usal

const Module = require('./ffprobe-wasm.js');
    const versions = {
        libavutil:  Module.avutil_version(),
        libavcodec:  Module.avcodec_version(),
        libavformat:  Module.avformat_version(),
    };

I get an TypeError: Module.avutil_version is not a function error

loretoparisi avatar Mar 08 '24 17:03 loretoparisi