lpms
lpms copied to clipboard
Implement feature extraction module for verification
Abstract
In our decentralized video network, it is important part to detect tamper videos to prevent the myriad of malicious attacks seeking to misinform, cheat to digital multimedia audience. This is a proposal for implementing verification module in lpms engine.
Motivation
We already developed verifier that run on broadcaster, this verifier has fairly good accuracy for detecting tampered videos. But we still need outsourcing verifier to reduce the calculation cost in broadcaster and we can introduced TTP(Truste d Third Part) similar to Zcash and can be used to implement the verification workflow. To implement this, first, the feature values to be used for verification must be calculated during transcoding.
Proposed Solution
Feature extraction module should proceed simultaneously with transcoding and should be real time and we also would want the feature diff b/w the source & rendition(s) as well as the source and the extra video. From this view point, I am going to integrate parallel independent decoding of extra video and ffmpeg api for calculation feature matrix. The diagram below shows the work flow of feature extraction module in lpms engine. In this diagram, if the number of transcoding profiles is two, we would get three different feature matrixes. One is feature diff between the source and extra, others feature diffs between the source and renditions.
- GO Level
As we can see in above diagram, the parameters for transcoding(+verification) will be two video local url. Therefore, we should add a function with a string array as parameters in following URL.
https://github.com/livepeer/lpms/blob/d5c85d86b206bddb63859921eca89f5c1f267b1e/transcoder/ffmpeg_segment_transcoder.go#L24
The return values are renditions byte array and the final diff score between original and extra video in the json form. func (t *FFMpegSegmentTranscoder) Transcode2(fname []string) ([][]byte, diffscore []string, error) { … … … }
- C Level
As with Go level, we have to add a function which has two local urls for videos(original & extra) as inputs https://github.com/livepeer/lpms/blob/d5c85d86b206bddb63859921eca89f5c1f267b1e/ffmpeg/lpms_ffmpeg.c#L1537
int lpms_transcode2(input_params *inp, int nb_inputs, output_params *params, output_results *results, int nb_outputs, output_results *decoded_results)
struct transcode_thread { int initialized; struct input_ctx ictx; struct output_ctx outputs[MAX_OUTPUT_SIZE]; int nb_outputs; ... ... ... AVFrame *list_frame_original AVFrame *list_frame_renditions };
And a list will be added into the " transcode_thread" or “input_ctx” structure to captures frames based on random frame indices. When transcoding, the frames of originals and renditions is stored into this list at our source here and and here
Meanwhile, a thread is used to perform decoding and frame capture of a extra video in parallel. Finally, call lvpdiff api of ffmpeg to calculate the frame different scores and generate the return value. I will refer to here and here in our code.
Ideally this function with two inputs will be take 0.2~0.4 seconds more than original transcoding function with one input.
Testing and Considerations
As like original transcoding function with one file input, in the "ffmpeg_test.go" should be added test function related to two inputs.
References
https://www.notion.so/livepeer/Real-Time-Verification-Thoughts-0a0ad16546a54dc3b77589f01f2bc333 https://github.com/livepeer/verification-classifier/tree/master/scripts https://stackoverflow.com/questions/37353250/may-i-use-opencv-in-a-customized-ffmpeg-filter https://en.wikipedia.org/wiki/Zcash https://en.wikipedia.org/wiki/Zero-knowledge_proof
Looks good @oscar-davids! Just a couple suggestions around the transcode function's argument names:
Rather than having input filenames/params as an array, can we instead have a separate argument for the extra video?
func (t *FFMpegSegmentTranscoder) Transcode2(fname string, extraname string) ([][]byte, diffscore []string, error) {
Similarly rather than multiple objects of input_params
in an array, we can send a new argument extra_inp
-
int lpms_transcode2(input_params *inp, input_params *extra_inp, output_params *params,
output_results *results, int nb_outputs, output_results *decoded_results) {
//...
}
I'm leaning towards not mixing the extra segment with the inputs in the API - as in future there might be a usecase where we might want to have multiple input files (one audio, one video, one subs or whatever) that are completely unrelated to the extra-segment used for verification. And anyway as the normal segments go through a different pipeline compared to the extra segment (which won't go through usual filter+encode according to the diagram) it makes sense to distinguish them clearly for the LPMS user.
Also feel free to name functions TranscodeAndVerify
and similar if you think that would be better, instead of using numbers like we've been using before.
Rest of the proposal like having diffscores as an array of n+1 size with n output renditions etc. all sounds good to me :)
I'm leaning towards not mixing the extra segment with the inputs in the API
From the API point of view this makes sense. I agree to change the function and parameter name.
func (t *FFMpegSegmentTranscoder) TranscodeAndVerify(fname string, extraname string) ([][]byte, diffscore []string, error) {
int lpms_transcodeandverify(input_params *inp, input_params *extra_inp, output_params *params, output_results *results, int nb_outputs, output_results *decoded_results) { //... }
I see that the spec currently references
https://github.com/livepeer/lpms/blob/d5c85d86b206bddb63859921eca89f5c1f267b1e/transcoder/ffmpeg_segment_transcoder.go#L24
go-livepeer actually doesn't use FFmpegSegmentTranscoder
right now and instead primarily uses the methods defined in ffmpeg.go all of which basically wrap Transcode(). The API for this method is a bit different than the transcode method for FFmpegSegmentTranscoder
. At the moment it is:
Transcode(input *TranscodeOptionsIn, ps []TranscodeOptions) (*TranscodeResults, error)
So, I think we should consider how we can either leverage the structs used in this API to pass around the data that we need or what API changes are necessary if the existing structs are not sufficient for what we need to do.
The return values are renditions byte array and the final diff score between original and extra video in the json form.
Is there a particular reason that the diff scores need to be a JSON string? Instead of returning the diff scores as a JSON string, would it make sense to use structs to avoid additional parsing at the Go level? Something like:
type TranscodedResults struct {
...
FeatureDiffs []FeatureDiff
}
type VerifyFeature int
const (
VerifyFeature_DCT_L1 = iota
VerifyFeature_Gauss_MSE
VerifyFeature_Gauss_L1
VerifyFeature_Gauss_Threshold_L1
VerifyFeature_Hist_Chi
)
type FeatureDiff []VerifyFeature
Not sure if the VerifyFeature
enum is needed - might be useful if the broadcaster code needs to pass the feature diff to the model for inference?
Rather than having input filenames/params as an array, can we instead have a separate argument for the extra video?
Could we specify the extra filename in TranscodeOptionsIn
? i.e.
type TranscodeOptionsIn struct {
Fname string
ExtraFname string
...
}
Something that isn't mentioned in the spec right now is the use of the lvpdiff filter vs the cuda_lvpdiff filter. We'll want to have a way to use the cuda_lvpdiff filter by checking if we're using Nvidia. We can probably use a check like input.Accel == Nvidia
in the Transcode()
method.