dlib icon indicating copy to clipboard operation
dlib copied to clipboard

dlib::video_capture

Open pfeatherstone opened this issue 4 years ago • 123 comments
trafficstars

This is an ffmpeg wrapper to capture any kind of video "thing", including video files (MP4, AVI, etc), RTSP stream or webcam ( eg. /dev/video0). Note that using this file imposes an additional license, which is FFMPEG's LGPL v2 license. Also note that this adds an additional dependency on libavformat.(so/a), libavcodec.(so/a), libswresample.(so/a) and libswscale.(so/a). This is NOT ready yet. It's here as a placeholder for now so people can try it out, test, report back bugs, etc.

pfeatherstone avatar Jan 12 '21 12:01 pfeatherstone

@davisking any ideas on how to do unit testing for this? I imagine doing the whole trick of compressing, base64-ing and inserting into a header file isn't sensible for an mp4 video. The tests will probably want to include videos with weird source formats, glitches, small to large dimensions, and probably with quite a few frames (probs > 1000). Also, this can't be tested on Travis because of the libavformat dependency. Unless there is a way of doing that which i'm not aware of.

pfeatherstone avatar Jan 14 '21 08:01 pfeatherstone

@davisking any ideas on how to do unit testing for this? I imagine doing the whole trick of compressing, base64-ing and inserting into a header file isn't sensible for an mp4 video. The tests will probably want to include videos with weird source formats, glitches, small to large dimensions, and probably with quite a few frames (probs > 1000). Also, this can't be tested on Travis because of the libavformat dependency. Unless there is a way of doing that which i'm not aware of.

Yeah. It sure how to get Travis to do that. Honestly I would stick a super tiny like 0.2second video into base64 and use that. That’s enough to exercise a decent part of it. Like what part of the code you wrote can’t be tested that way?

I realize there are lots of formats but it’s not the code you have written that handles that stuff. I wouldn’t worry about trying to exhaustively test libavformat, just the code in dlib.

davisking avatar Jan 14 '21 12:01 davisking

@davisking any ideas on how to do unit testing for this? I imagine doing the whole trick of compressing, base64-ing and inserting into a header file isn't sensible for an mp4 video. The tests will probably want to include videos with weird source formats, glitches, small to large dimensions, and probably with quite a few frames (probs > 1000). Also, this can't be tested on Travis because of the libavformat dependency. Unless there is a way of doing that which i'm not aware of.

Yeah. It sure how to get Travis to do that. Honestly I would stick a super tiny like 0.2second video into base64 and use that. That’s enough to exercise a decent part of it. Like what part of the code you wrote can’t be tested that way?

I realize there are lots of formats but it’s not the code you have written that handles that stuff. I wouldn’t worry about trying to exhaustively test libavformat, just the code in dlib.

Yeah i get your point. We don't want to be unit-testing libavformat. That's hopefully already been done. I guess any length video can be used. There is nothing in the code that depends on time per se. Okidok. Now for the fun of choosing a 0.2s clip.

pfeatherstone avatar Jan 14 '21 13:01 pfeatherstone

The code also works out the box with rtsp and camera devices. Those would be harder to test in a unit test. Unless you fancy hosting a dummy rtsp stream somewhere that will be active until the end of time.

pfeatherstone avatar Jan 14 '21 13:01 pfeatherstone

The code also works out the box with rtsp and camera devices. Those would be harder to test in a unit test. Unless you fancy hosting a dummy rtsp stream somewhere that will be active until the end of time.

Yeah, would be neat, but I don't want to deal with that :)

davisking avatar Jan 14 '21 13:01 davisking

The natural follow up from this is gonna be dlib::video_writer, which won't require much effort (it's similar API calls). So I think i will reinstate the following header hierarchy :

  • dlib/video_io.h
  • dlib/video_io/video_capture.h
  • dlib/video_io/video_writer.h

I could either shuv dlib::video_writer in this PR or put it in its own PR. I prefer the latter. In any case, maybe worth getting the headers right.

pfeatherstone avatar Jan 14 '21 16:01 pfeatherstone

Yeah do it in a separate PR. More smaller PRs rather than few big ones is best :)

davisking avatar Jan 15 '21 03:01 davisking

i've added something so you can read the metadata of the video stream. You can then detect stuff like whether the video is rotated. Very useful. We could then correct for that rotation if required. Nice.

pfeatherstone avatar Jan 19 '21 13:01 pfeatherstone

At some point I will finish this. I'm using it all now and works fine but don't quite have the time to push it over the edge and make it Davis-approved. And making CMake detect ffmpeg nicely and consistently on all platforms is going to be a ball-ache.

pfeatherstone avatar Feb 13 '21 14:02 pfeatherstone

We will be able to make this way more configurable than opencv's wrappers. Like we can set CRF values, gop sizes, the list is endless. Haven't quite decided on how to provide a nice Api. Probably just provide a std::vector<std::pair<string,string>> of options. Otherwise use some sensible defaults.

pfeatherstone avatar Feb 13 '21 14:02 pfeatherstone

We will be able to make this way more configurable than opencv's wrappers. Like we can set CRF values, gop sizes, the list is endless. Haven't quite decided on how to provide a nice Api. Probably just provide a std::vector<std::pair<string,string>> of options. Otherwise use some sensible defaults.

Eh, don't do a stringly typed interface. That's like having void pointers. There has to be really clear user documentation saying exactly what can and can't be done. So unless ffmpeg already deals in this kind of stringly typed interface and you are just saying we forward that it's not a great idea. Even then it's not super hot. I'm open to an argument to the contrary. But I've never seen a stringly typed interface that wasn't just an excuse to not define the interface.

davisking avatar Feb 13 '21 22:02 davisking

By the way I was talking about a video encoder, not dlib::video_capture. For dlib::video_capture the options are minimal and API can be fairly tight.

pfeatherstone avatar Feb 13 '21 23:02 pfeatherstone

For a video encoder on the other hand, most of the codec specific options are forwarded by the libavformat API as strings. Like {"preset", "slow"} for H264. So however the dlib API handles options, it will have to forward them to libavformat as strings. And there are 100s of options depending on the codec (literally). So I think the easiest way to handle specific options is using strings and let libaformat do error handling if they are bad, which dlib can forward as exceptions. Or, we can do what opencv does which is allow no options and have a minimal API. I don't like that. or we go all out and allow the user to forward any codec specific options. They would need to know what they are doing and read the ffmpeg documentation carefully. My use case, and I'm sure other people will have it too, is to be able to play with options. Like I want to be able to tweak the bitrate, the CRF value, whether it's lossless, the number of b frames, etc. I think this level of tuning will set dlib apart from other libraries.

pfeatherstone avatar Feb 13 '21 23:02 pfeatherstone

Some options will be standard and common across all codecs, and therefore we can make them typed of course, even strongly typed if we can be bothered.

pfeatherstone avatar Feb 13 '21 23:02 pfeatherstone

Yeah, that all sounds good then. :)

davisking avatar Feb 15 '21 13:02 davisking

Just realised, the video capture, which also works as an RTSP client, can work as an RTSP push server if you set some additional flags. Awesome! if I update the encoder to also include the container format(at the moment it's just the encoder) it can act as an RTSP push client.

pfeatherstone avatar Feb 24 '21 13:02 pfeatherstone

And I think the encoder can also work as an RTSP server if I make the update.

pfeatherstone avatar Feb 24 '21 13:02 pfeatherstone

Nice! Have you checked if you can open GIF animated files? With cv::VideoCapture you can read them as if they were a video file. If that works, we will also have animated GIF support in dlib! :D

arrufat avatar Feb 24 '21 13:02 arrufat

I believe so yes.

pfeatherstone avatar Feb 24 '21 15:02 pfeatherstone

You can use the video_capture class to open JPEG files if you want. It's just you will only be able to read 1 frame. Then it closes. That thing will open pretty much anything. FFMpeg is awesome.

pfeatherstone avatar Feb 24 '21 15:02 pfeatherstone

Having support for RTSP client/server/push is a fantastic bonus in all of this.

pfeatherstone avatar Feb 24 '21 15:02 pfeatherstone

i don't know when i will be able to finish this. i have to use YUV at the moment which dlib doesn't support. So i've taken this code, modified it to fit my purpose. At the moment, I don't have an incentive to finish this off properly as i'm not using it.

pfeatherstone avatar Feb 24 '21 15:02 pfeatherstone

Warning: this issue has been inactive for 35 days and will be automatically closed on 2021-04-10 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

dlib-issue-bot avatar Apr 01 '21 08:04 dlib-issue-bot

I'm genuinely interested in this feature, but I don't think I'll have time soon to work on it, unfortunately :(

arrufat avatar Apr 01 '21 09:04 arrufat

yeah me too. I mean it's very close to being ready. The biggest nuisance is going to be some Davis' approved cmake scripts for detecting ffmpeg, and checking its version, etc. Though that won't be necessary for the dlib library itself, since this class is header only, and it's up to the user to link to libavformat and so on, but for the dlib unit tests, we want a good cmake script for ffmpeg. FFmpeg doesn't ship with a cmake script. So might have to borrow the one in opencv or something. I never use cmake in my own projects, i just use netbeans 8.2, so i explicitly set the linker flags. So doing all the cmake stuff isn't really my strong suit.

pfeatherstone avatar Apr 01 '21 10:04 pfeatherstone

I also have a video encoder (e.g h264, vp9) and video muxer (e.g h264 + mp4, h264 + rtsp) that are ready to go, but that will be for a future PR.

pfeatherstone avatar Apr 01 '21 10:04 pfeatherstone

Na some basic cmake option is presumably fine. Where this kind of thing goes off the rails is trying to make it work reliably on windows since windows has no coherent conventions for linking to installed libraries. But I’m fine with just telling people it’s on them to link to it in windows.

davisking avatar Apr 01 '21 12:04 davisking

Warning: this issue has been inactive for 35 days and will be automatically closed on 2021-05-16 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

dlib-issue-bot avatar May 07 '21 08:05 dlib-issue-bot

Warning: this issue has been inactive for 42 days and will be automatically closed on 2021-05-16 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

dlib-issue-bot avatar May 14 '21 08:05 dlib-issue-bot

I want this so badly, that I will try to make it work with CMake (never done that before, though) before it gets closed and falls into oblivion :P

arrufat avatar May 16 '21 07:05 arrufat

I've added support for finding FFMPEG using PkgConfig in CMake. @pfeatherstone would you mind giving me write access to this PR?

arrufat avatar May 16 '21 09:05 arrufat

Yep. Gimme a sec

pfeatherstone avatar May 16 '21 11:05 pfeatherstone

@arrufat I think you should have write-access now.

pfeatherstone avatar May 16 '21 11:05 pfeatherstone

What would be really cool, is if you could specify where to look for ffmpeg. If using pkg-config under the hood, i imagine using PKG_CONFIG_PATH will suffice.

pfeatherstone avatar May 16 '21 11:05 pfeatherstone

Yes, I already pushed a PR: https://github.com/pfeatherstone/dlib/pull/1

arrufat avatar May 16 '21 11:05 arrufat

Wow, a PR within a PR

pfeatherstone avatar May 16 '21 11:05 pfeatherstone

I merged. I'm not an authority on cmake I'm afraid. But I do think the API can be improved a lot. I worked for a while on a different version of this. I would be keen to introduce those changes. The main things would be to optionally read audio frames. That might require a well thought-out dlib type. I added some more run time checks and more constructor arguments to cater for decoder options (both video and audio), format/demuxer options and protocol options (if using a raw TCP muxer for example). We can either introduce all these later or do it now. If we do it now, the API will be slightly more fixed. If we do it later, this PR will get passed sooner but will inevitably lead to API changes at a later stage which @davisking is really not keen on.

pfeatherstone avatar May 16 '21 12:05 pfeatherstone

It might be easier to merge what we have now and do bite-size increments. But we might need to add something that says : "This is an unstable API. If you don't like it, tough". This could be a compiler warning for example to make it absolutely clear.

pfeatherstone avatar May 16 '21 12:05 pfeatherstone

Or we namespace the incremental versions. This could be in namespace dlib::video_io::v1, and later versions which introduce breaking API changes could be in namespace dlib::video_io::v2 etc... I don't like this at all. I would rather have a compiler warning that says this is an unstable API and leave everything in the dlib namespace. It depends on what guarantees we want to impose on the API.

pfeatherstone avatar May 16 '21 13:05 pfeatherstone

Don’t worry about API versioning. I care about API stability in proportion to the age of the API, since that’s very correlated with the number of users. Like the question is always how many people will be impacted by a change and how difficult will it be for them to update. Moreover, some breaking changes are super easy for users to update and some are not. Like if the result is only direct users get a build error and how to update their code is manifestly obvious then that’s fine.

On the other hand, changes that silently cause runtime faults are not super great.

davisking avatar May 16 '21 21:05 davisking

One we could tackle this, is how GCC adds new C++ features to the standard library: they put them under the std::experimental namespace, so maybe we could add this stuff under dlib::experimental and move them to the main namespace... Not sure is a good idea, though...

Also, @pfeatherstone how am I supposed to open the webcam stream using this PR? I played a bit with it yesterday, but couldn't find a way...

arrufat avatar May 17 '21 02:05 arrufat

It's been a while since i've used this particular object, but i thought it was something like:

dlib::video_capture cap;
cap.open("/dev/video0");
...

pfeatherstone avatar May 17 '21 06:05 pfeatherstone

That's exactly what I tried. There's a second boolean parameter is_rtsp, but I got errors no matter what I set it to:

can't open '/dev/video0' error : Invalid argument

arrufat avatar May 17 '21 06:05 arrufat

I'll have a look at some point. Can't guarantee when though. It's possible this object is out of date. I haven't added nearly enough runtime checks with useful print statements. I'll compare at some point with what I've been using recently which definitely does work with capture devices.

pfeatherstone avatar May 17 '21 07:05 pfeatherstone

With regards to your error, i think that's an FFmpeg thing. Can you build ffmpeg from source using version v4.3.2 and try again? Maybe the v4l2 muxer/demuxer isn't enabled in your version of FFmpeg. These are the kind of things we want to check for at runtime to give the user good error messages. It's failing on avformat_open_input(), which is the first thing the implementation details should run, which they do, and it is doing so with the right arguments. Given that the error returned by libavformat is Invalid argument, i'm willing to bet it's got something to do with the, possibly old, installation of ffmpeg.

pfeatherstone avatar May 17 '21 11:05 pfeatherstone

I need to make some updates if we want this to work with ffmpeg v4.4 onwards, otherwise you will get compiler warning messages since ffmpeg have deprecated the use of av_init_packet() (and other function calls i'm currently using in this object). Instead you have to use av_packet_alloc(). So yeah, there are a whole bunch of updates i need to port, which i took care of in my local version, but haven't had time to merge into this dlib version. Again, I don't know when i'm gonna have time to finish this properly since a dlib version is no use to me since if have to work with YUV.

pfeatherstone avatar May 17 '21 11:05 pfeatherstone

Going back to installing ffmpeg from source, i would install it to a local directory, not system wide. A whole bunch of things depend on libav libraries and if they link to new libav libraries with subtle API changes, everything will break. Like, if you've installed opencv using sudo apt install libopencv-dev (or whatever), that will be linking to the default libav libraries that ship with your distro. If you overwrite those, then it will very likely break. So i've learnt to always install fresh libav libraries locally and carefully link to those instead of those provided by apt or yum. This is not a new problem, this is the case with pretty much every compiled library. But be warned, I had opencv segfault and couldn't understand why, and it was because it was linking to ffmpeg v4.3.2 when it was built with ffmpeg v<something very small>

pfeatherstone avatar May 17 '21 11:05 pfeatherstone

Or use a decent C++ package manager like vcpkg, conan, hunter or whatever. But i've never used them so can't say for sure if that's what you want. You're probably thinking, "I thought this would make my life easier, but rather than linking against bloated opencv, i now need to worry about linking to a good version of libavformat. Have I made any progress..." The answer is yes, coz soon you will have a nice script for building and installing ffmpeg from source with the right options you want, AND, you will be able to statically link everything, AND you won't be linking against 100 shared libraries you didn't know about, or care about, AND you will celebrate.

pfeatherstone avatar May 17 '21 12:05 pfeatherstone

Furthermore, you can tailor your ffmpeg build to only include the exact set of encoders, decoders, muxers, demuxers, protocols, filters and devices you strictly need for you app. I did this, and the resulting set of libav static libraries was tiny. Furthermore, after stripping symbols, I ended up with a 1MB binary, which these days is pretty small. So this gives you quite a lot of configurability, provided you know your way around building ffmpeg.

pfeatherstone avatar May 17 '21 12:05 pfeatherstone

Thanks for your explanation, and yes, don't worry, I won't mess my system with this stuff. I have FFmpeg 4.4 from the official Arch Linux repositories, and V4L support. I will try with 4.3 later.

arrufat avatar May 18 '21 00:05 arrufat

If it still doesn't work, you can try calling avformat_open_input() on its own and check it passes. That's the very first thing that should be called.

pfeatherstone avatar May 18 '21 06:05 pfeatherstone

Also, try calling avdevice_register_all(); in main() somewhere at the start. This is a global initialization function for libavdevice. Sometimes this is required. I've only ever had to call this to enable ALSA stuff, which is used for audio. But it's possible that you need to call that in your environment to enable V4L.

You will have to add the following:

extern "C" {
#include <libavdevice/avdevice.h>
}

pfeatherstone avatar May 18 '21 10:05 pfeatherstone

If it turns out that is THE fix, then maybe we need to add some static initialization in the dlib wrapper. And at the same time, add the other global initialization functions. They are largely deprecated, but could still be required for older versions of libav libraries.

pfeatherstone avatar May 18 '21 10:05 pfeatherstone

Great, I will try that later, thank you for looking into this!

arrufat avatar May 19 '21 09:05 arrufat

@pfeatherstone, I can open the webcam using those lines. Thank you. I am not familiar with the ffmpeg api, though (only with the command line program), I will have to study it a bit more to be able to do something useful with it...

arrufat avatar May 21 '21 03:05 arrufat

Does the dlib object work now? Your error was failing at avformat_open_input() which is the very first thing dlib calls. So it should work right?

pfeatherstone avatar May 21 '21 06:05 pfeatherstone

I know this is not helpful, but dlib::video_capture works for me.

pfeatherstone avatar May 21 '21 06:05 pfeatherstone

Yes, I can see how the webcam light turns on :) But I don't know how to get an image from the camera

avdevice_register_all();
dlib::video_capture capture;
capture.open("/dev/video0", false);
std::cout << capture.is_open() << std::endl; // prints 1
auto metadata = capture.get_video_metadata(); // crashes with map::at exception
std::cout << metadata.size();
for (const auto& m : metadata)
{
    std::cout << m.first << ", " << m.second << std::endl;
}

But that crashes at .get_video_metadata() as the key in the map does not exist...

arrufat avatar May 21 '21 06:05 arrufat

Ok, get_video_metadata() crashing could be me being lazy with error checking, or lack of.

pfeatherstone avatar May 21 '21 06:05 pfeatherstone

To read a frame, do the following:

dlib::matrix<dlib::rgb_pixel> frame;
uint64_t timestamp_ns;
if (capture.read(frame, timestamp_ns)
{
    // do something
}
else
{
    //the stream has closed. check that capture.is_open() returns false
}

pfeatherstone avatar May 21 '21 06:05 pfeatherstone

Note that with webcam device, the timestamp will be gibberish.

pfeatherstone avatar May 21 '21 06:05 pfeatherstone

Looking at the code, it's a bit out of date I have to say. I need to fix this at some point. Maybe my partner will let me this weekend.

pfeatherstone avatar May 21 '21 06:05 pfeatherstone

It worked, thank you! I will try to understand that's going on, too :)

arrufat avatar May 21 '21 06:05 arrufat

So my plan will be the following:

  • fix things here and there (using the non-deprecated API calls, more error checking and legitimate bug fixes)
  • put all the implementation details in a class called dlib::video_demuxer_impl and put that in the header dlib/video_io/video_demuxer_impl.h. Then the derived class, which captures the API will be dlib::video_demuxer or dlib::video_capture which inherits from dlib::video_demuxer_impl. That way dlib::video_demuxer/dlib::video_capture is clean, easier to look at, and the public API calls are obvious. The reason for choosing the name dlib::video_demuxer is that it is inline with FFMPEG documentation and leaves room for classes such as dlib::video_decoder, dlib::video_encoder and dlib::video_muxer in the future.
  • The constructor arguments need to be sorted. There are loads more options i need to add which allow you to configure the demuxer a bit like how ffmpeg allows you to configure it. All of these will be in a struct, maybe called dlib::video_demuxer::args which will have loads of sensible defaults, and all of which are clearly documented. These will include stuff like demuxer options, decoder options, protocol options, heights, widths, blah blah. These are mostly useful if you are demuxer an RTSP stream, HTTPS stream or something like that. These options can allow you to set the TCP timeout for example...

pfeatherstone avatar May 21 '21 06:05 pfeatherstone

I might get audio sorted also, at least in the implementation details. The hardest part will be deciding what the api call should be. Currently, dlib::video_capture::read() is templated can take any valid dlib image container. To support audio, i might have to fix it to something sensible and change the API call to something like:

bool dlib::video_capture::read(dlib::type_safe_union<dlib::array2d<dlib::rgb_pixel>, dlib::audio_type>& frame, uint64_t& timestamp_ns)

or something like that. Not ideal, since that forces you to use dlib::array2d<dlib::rgb_pixel> but i don't really want to have read_video_frame() and read_audio_frame coz then you have to search the internal queue of buffered frames to find one that matches your frame type. Not really a fan of that. I would rather pop the queue just like a FIFO and keep popping until you get the frame type you want. it also means that all frames are correctly time-ordered.

Anybody have an opinion on this? Maybe i could have separate queues for each frame type?

pfeatherstone avatar May 21 '21 06:05 pfeatherstone

So my plan will be the following:

  • fix things here and there (using the non-deprecated API calls, more error checking and legitimate bug fixes)
  • put all the implementation details in a class called dlib::video_demuxer_impl and put that in the header dlib/video_io/video_demuxer_impl.h. Then the derived class, which captures the API will be dlib::video_demuxer or dlib::video_capture which inherits from dlib::video_demuxer_impl.

Why use inheritance? public inheritance (I imagine that's what's proposed) should model an isa relationship. e.g. someone could write a function that takes a dlib::video_demuxer_impl& and they could pass a dlib::video_demuxer. That's the kind of thing inheritance is all about. But it doesn't sound like it would be appropriate for anyone to do this.

davisking avatar May 21 '21 11:05 davisking

@davisking Sorry, I wasn't going to use inheritance. As soon as I wrote that I thought using the following would be more appropriate.

class video_demuxer
{
public:
//...
private:
    std::unique_ptr<video_demuxer_impl> _state;
};

Just as a side note, I would still make everything header only as i don't want to tie dlib to a particular version of libav*.

pfeatherstone avatar May 21 '21 12:05 pfeatherstone

Or not bother with std::unique_ptr at all... Either way, it will be something sensible

pfeatherstone avatar May 21 '21 18:05 pfeatherstone

Sounds good :)

davisking avatar May 22 '21 00:05 davisking

Ok, tons of commits coming tomorrow. What would be extremely useful is if someone could either find some teeny tiny videos, that include both image/video and audio in a variety of formats, or record some. We will need this for the unit tests.

pfeatherstone avatar May 22 '21 19:05 pfeatherstone

Right, i've added the muxing stuff. It's not finished. But even when it will be, I will disable it and add the tests in a future PR.

pfeatherstone avatar May 23 '21 09:05 pfeatherstone

Right, that's me done for the weekend. The rest will have to wait a couple weeks.

pfeatherstone avatar May 23 '21 10:05 pfeatherstone

By the way, with the current API, this is a minimum example:

    dlib::video_demuxer_args args;
    args.filepath = "myvid.mp4";
    dlib::video_demuxer cap(args);
    
    if (cap.is_open())
    {       
        dlib::image_window win;

        dlib::type_safe_union<dlib::array2d<dlib::rgb_pixel>, dlib::audio_frame> frame;
        uint64_t timestamp_us;
        
        while (cap.read(frame, timestamp_us))
        {
            if (frame.contains<dlib::array2d<dlib::rgb_pixel>>())
            {
                const auto& image = frame.cast_to<dlib::array2d<dlib::rgb_pixel>>();
                win.set_image(image);
            }
        }
    }

pfeatherstone avatar May 23 '21 10:05 pfeatherstone

For webcam device and audio device, you need to call avdevice_register_all(). The code doesn't call that automatically yet.

Then, for webcam:

    dlib::video_demuxer_args args;
    args.filepath = "/dev/video0";
    dlib::video_demuxer cap(args);

For microphone (check hardware device using arecord -l )

    dlib::video_demuxer_args args;
    args.enable_audio = true;
    args.filepath = "hw:0";
    args.input_format = "alsa";
    dlib::video_demuxer cap(args);

pfeatherstone avatar May 23 '21 10:05 pfeatherstone

You can do screen grab with this too. You would need to check the ffmpeg documentation. You need to lookup what to set args.filepath to, then set args.input_format = "x11grab" I think. But i've never tried that.

Look here for more information.

pfeatherstone avatar May 23 '21 10:05 pfeatherstone

For rtsp, you could do something like:

    dlib::video_demuxer_args args;
    args.filepath = "rtsp://localhost:554/mystream";
    args.input_format = "rtsp";
    args.format_options = {{"max_delay", "5000000"},
                           {"rtpflags",  "send_bye"},
                           {"rtsp_transport", "tcp"}};
    dlib::video_demuxer cap(args);

You might need to tweak the options to suit your needs. Then you need to study the ffmpeg documentation. I know it's a pain. But that's life.

pfeatherstone avatar May 23 '21 10:05 pfeatherstone

@arrufat @davisking Shouldn't all the cmake stuff apply to the unit test CMakeList.txt file, not the main one?

pfeatherstone avatar May 23 '21 12:05 pfeatherstone

Rather than using http://dlib.net/file_to_code_ex.cpp.html to encode video into the unit tests, can we use something like https://github.com/graphitemaster/incbin to embed data into the dtest executable? It can embed arbitrarily sized files, is cross platform and works really well. This might be more suitable for MP4 files.

pfeatherstone avatar May 23 '21 14:05 pfeatherstone

Apparently std::pair has a non-trivial copy constructor which means memcpy gets upset. Weird.

pfeatherstone avatar May 23 '21 14:05 pfeatherstone

Rather than using http://dlib.net/file_to_code_ex.cpp.html to encode video into the unit tests, can we use something like https://github.com/graphitemaster/incbin to embed data into the dtest executable? It can embed arbitrarily sized files, is cross platform and works really well. This might be more suitable for MP4 files.

Why use that instead? It depends on things outside the c++ standard to work, which is unnecessary. I get what it does, which is dandy, but it’s a deviation from the standard to use it. Every deviation from what’s supported by the standard is almost always a huge pain. It invariably breaks on someone’s build system, platform, or whatever.

davisking avatar May 23 '21 15:05 davisking

So i used http://dlib.net/file_to_code_ex.cpp.html to encode a 50s clip, and the compiler timed out after 15 minutes

pfeatherstone avatar May 23 '21 15:05 pfeatherstone

I could use a much smaller clip but, encoding the file as a gigantic string puts a very low upper bound on file sizes.

pfeatherstone avatar May 23 '21 15:05 pfeatherstone

I tried to set the installation path of ffmpeg/libav* using PKG_CONFIG_PATH environment variable, but i don't think cmake picked it up. Would it be possible to have another option in CMakeLists.txt which set the ffmpeg installation path? Something like DLIB_FFMPEG_PATH ?

pfeatherstone avatar May 23 '21 15:05 pfeatherstone

Having said that, it's up to the user's build tools to link to ffmpeg correctly, not dlib's cmake file. I guess we still need something like this for the unit test CMakeLists.txt file.

pfeatherstone avatar May 23 '21 15:05 pfeatherstone

Currently i don't think cmake is using pkg-config --libs libavdevice to get the complete linker flags (for libavdevice). I think it's just using -lavdevice.

pfeatherstone avatar May 23 '21 15:05 pfeatherstone

@pfeatherstone Thank you for putting a lot of effort into this, and showing some example uses of the API.

However, I am not sure I understand what you mean by this:

@arrufat @davisking Shouldn't all the cmake stuff apply to the unit test CMakeList.txt file, not the main one?

Do you think CMake is not properly linking with the FFmpeg libraries, and it just happens to work because they are in the right PATH for me?

arrufat avatar May 24 '21 01:05 arrufat

So i used http://dlib.net/file_to_code_ex.cpp.html to encode a 50s clip, and the compiler timed out after 15 minutes

Heh, yeah ok that’s too much data.

So there isn’t anything inherently wrong with just reading the file like normal. Just make cmake set a macro that tells the code where the file is. For test code that’s fine and functionality equivalent to the other options we are talking about.

davisking avatar May 24 '21 01:05 davisking

As for what cmake is doing to find ffmpeg, I don't know. I've never used pkg_check_modules. My experience with pkg-config has never been super great. Sometimes it's ok though. But regardless of that, it's fairly common that you have to tell cmake what folders to look in and use stuff like find_libray or find_file to just do it yourself. I don't know what's the best thing here though.

davisking avatar May 24 '21 01:05 davisking

I think we will need to use pkg-config, since it gives you all the dependencies of the ffmpeg libraries. For example, if you build ffmpeg from scratch out the box, libavdevice will probably depend on -lX11, -lasound and a few others. If you're on windows, it will depend on GDI or avfoundation. Alternatively if you build ffmpeg while explicitly disabling all of this, then libavdevice won't have any dependencies (and will probably do very little, maybe just support v4l2)

When it comes to muxing, which this PR doesn't concern itself with officially I know, you WILL have to build ffmpeg with thirdparty libraries like libx264, maybe libx265, and others depending on what you want to mux. ffmpeg doesn't support a lot of fancy encoders "natively" (It does however support pretty much all decoders without thirdparty libraries). Have you ever tried to create a video in opencv using h264? Most of the time it complains. That's because it uses ffmpeg under the covers, and most people don't build ffmpeg with libx264. So it can't do it. Most of the time, opencv users have to use mpeg encoder, which ffmpeg supports "natively". But that's balls.

So, if you don't want to make your cmake script an awful mess, i would use pkg-config --static --libs libavformat, pkg-config --static --libs libavcodec, and so on to set the linker flags as it will correctly include all the dependencies. To get the C flags, it's pkg-config --cflags libavdevice, pkg-config --cflags libavformat and so on. This is pretty much the only way to correctly build against ffmpeg correctly in all circumstances.

In order to set the right path, you need to adjust the PKG_CONFIG_PATH macro. So if i installed ffmpeg in directory DIRECTORY, then i usually set the following before calling make:

PKG_CONFIG_PATH=DIRECTORY/lib/pkgconfig

Then pkg-config will look in the right place.

NB: if you look at the ffmpeg build docs, there are literally 100s of ways you can build it. There is an enormous gap between a "minimum" build and a "full" build. And that translates to very small compiler/linker flags to something enormous. Like you can, if you want, build ffmpeg with tensorflow as a dependency for all its fancy filters! Crazy I know. The only way to have consistent build, is to use pkg-config as it will capture how you built ffmpeg locally.

pfeatherstone avatar May 24 '21 06:05 pfeatherstone

So in short, though I don't claim to understand cmake at all, find_library and find_file are probs not the right idea since it won't tell you what the dependencies are.

pfeatherstone avatar May 24 '21 06:05 pfeatherstone

So i used http://dlib.net/file_to_code_ex.cpp.html to encode a 50s clip, and the compiler timed out after 15 minutes

Heh, yeah ok that’s too much data.

So there isn’t anything inherently wrong with just reading the file like normal. Just make cmake set a macro that tells the code where the file is. For test code that’s fine and functionality equivalent to the other options we are talking about.

By the way, gcc didn't time out when running http://dlib.net/file_to_code_ex.cpp.html on the 50s clip, that took ~2s to compile and roughly 30s to run. It's when I copy-pasted the output, which was ~90000 lines of code to a .cpp file, and built it that gcc timed out. I don't think gcc liked compiling 90000 lines of strings into a stringstream.

pfeatherstone avatar May 24 '21 06:05 pfeatherstone

@arrufat Is opencv's cmake script of any help?

pfeatherstone avatar May 24 '21 06:05 pfeatherstone

@arrufat Do you mind taking ownership of the cmake stuff?

pfeatherstone avatar May 24 '21 06:05 pfeatherstone

Well, you can take ownership of the whole thing if you really want to, I'm not bothered. But if i attempt to do the cmake stuff myself, i will probably end up using bad practices from watching bad youtube videos or reading deprecated docs. I've been too stubborn to sit down and learn it.

pfeatherstone avatar May 24 '21 06:05 pfeatherstone

@pfeatherstone I am not a CMake expert, but I am definitely willing to try. However, I am sure you'll do a much better job than me at the FFmpeg part :)

arrufat avatar May 24 '21 06:05 arrufat

Also, if all my verbose explanations are putting people off using this, then we need to stress that all dlib is doing is providing you with a wrapper. It's not giving you any strong guarantees since it will very likely break if you attempt to use in a way your ffmpeg local build doesn't support. For example, if you built ffmpeg with option --disable-devices, then attempt to read from a webcam, it WON'T WORK. There is little dlib can do about that. (You can in theory use the API to read all the supported encoders, decoders, muxers, devices, etc, but that's a lot of runtime checking just so you can have a slightly better error message). Currently, libav will just fail somewhere and return an error code saying Invalid Input or something like that. So the onus is on the user to build ffmpeg correctly. dlib is just saying "i have a wrapper to make your life slightly easier than using the raw C API manually. I can forward all your fancy format/protocol/codec private options to the C API, but you have to go away, read the ffmpeg docs yourself, make sure they are correct, and I will forward them. If they are incorrect, well that's your fault"

pfeatherstone avatar May 24 '21 07:05 pfeatherstone

Looking at cmake docs, it looks like CMAKE_PREFIX_PATH is added to the pkg-config search path. So maybe you won't need to set PKG_CONFIG_PATH

pfeatherstone avatar May 24 '21 07:05 pfeatherstone

Build fails because it can't find libavformat headers. That's normal. Will deal with that at some point in the distant future.

pfeatherstone avatar Jun 15 '21 12:06 pfeatherstone

Build fails because it can't find libavformat headers. That's normal. Will deal with that at some point in the distant future.

Would that be solved by updating the Travis file?

arrufat avatar Jun 16 '21 00:06 arrufat

Probably. But I simply don't have spare time to work on this as none of my work projects justify it. I have my own implementations, they do the job and I'm not using dlib. So it's not like the couple function wrappers that convert to dlib types are useful to me. Until I have a project that justifies me working on this again, this is going to go a bit stale I'm afraid. Unless you want to take it forward. That's absolutely fine by me. I can help out here and there if you find bugs (coz I can then fix my own implementation...). Sorry

pfeatherstone avatar Jun 16 '21 07:06 pfeatherstone