shaka-packager icon indicating copy to clipboard operation
shaka-packager copied to clipboard

Do not use File::GetFileSize to get segment size

Open kqyang opened this issue 6 years ago • 14 comments

File::GetFileSize opens the file in read mode to get the file size. This could be a problem for non-local files, e.g. the HttpFile introduced in #149. In fact, it may not work for local files either due to file caching.

Right now it is used in MPEG2TS and WebVTT output writers: https://github.com/google/shaka-packager/blob/51d39d96a1d5e6926334122c09910932f32313fa/packager/media/formats/mp2t/ts_segmenter.cc#L189, https://github.com/google/shaka-packager/blob/f49b89280cc9b2024198640db87ea2ab79700d38/packager/media/formats/webvtt/webvtt_text_output_handler.cc#L112. MP4, WebM, PackedAudio output writers are not using File::GetFileSize.

MPEG2TS code and WebVTT code should be cleaned up to follow a similar approach as MP4 / WebM / PackedAudio.

Two proposals:

  1. Store and accumulate the sizes in memory in relevant classes
  2. Alternatively, use File::Tell to get the current file position, which should be supported for all files that supports writing.

kqyang avatar Nov 13 '18 00:11 kqyang

A quick patch for TS is:

diff --git i/packager/media/formats/mp2t/ts_segmenter.cc w/packager/media/formats/mp2t/ts_segmenter.cc
index e9d54b39a1..be2a48167b 100644
--- i/packager/media/formats/mp2t/ts_segmenter.cc
+++ w/packager/media/formats/mp2t/ts_segmenter.cc
@@ -181,13 +181,18 @@ Status TsSegmenter::FinalizeSegment(uint64_t start_timestamp,
   // This method may be called from Finalize() so ts_writer_file_opened_ could
   // be false.
   if (ts_writer_file_opened_) {
+    base::Optional<uint64_t> file_position = ts_writer_->GetFilePosition();
+    if (!file_position) {
+      return Status(
+          error::MUXER_FAILURE,
+          "Failed to get file position in TsSegmenter::FinalizeSegment.");
+    }
     if (!ts_writer_->FinalizeSegment()) {
       return Status(error::MUXER_FAILURE, "Failed to finalize TsWriter.");
     }
     if (listener_) {
-      const int64_t file_size =
-          File::GetFileSize(current_segment_path_.c_str());
+      const int64_t file_size = file_position.value();
       listener_->OnNewSegment(current_segment_path_,
                               start_timestamp * timescale_scale_ +

kqyang avatar Nov 13 '18 01:11 kqyang

Dear Kongqun,

we just added your patch for TS as 08e6b21e and it seems to work just fine, thank you so much!

You also asked in #149:

[The patch will] use File::Tell to get the file position. Can you update HttpFile class to support Tell?

It looks like it is working even without any further changes required. According to the implementation, HttpFile::Tell never gets called at all. Did you mean to add it for other output writers like WebVTT? Are we missing something?

With kind regards, Andreas.

amotl avatar Nov 13 '18 21:11 amotl

@amotl It does not seem to be possible based on the code as TsWriter::GetFilePosition() calls Tell():

base::Optional<uint64_t> TsWriter::GetFilePosition() {
  if (!current_file_)
    return base::nullopt;
  uint64_t position;
  return current_file_->Tell(&position) ? base::make_optional(position)
                                        : base::nullopt;
}

Can you double check?

kqyang avatar Nov 13 '18 21:11 kqyang

Can you double check?

Sure, i'm curious already what i might be missing again.

Observations

So, when running with additional logging like

base::Optional<uint64_t> TsWriter::GetFilePosition() {
  if (!current_file_)
    return base::nullopt;
  VLOG(1) << "TsWriter::GetFilePosition for " << current_file_->file_name();
  uint64_t position;
  return current_file_->Tell(&position) ? base::make_optional(position)
                                        : base::nullopt;
}

Shaka Packager will tell us things like

TsWriter::GetFilePosition for 192.168.178.101:6767/hls-live/bigbuckbunny-video-720-0083.ts

As you will recognize, the protocol prefix http:// is missing here, so file.cc wouldn't even think about dispatching it to HttpFile again and will happily go to ThreadedIoFile instead: https://github.com/google/shaka-packager/blob/db3ed544f8c31ed4889d44a6e8d33b38a3d0d6a0/packager/file/threaded_io_file.cc#L152-L157

Reflections

I knew this would be hitting us again ;], it already dragged me away hunting ghosts the other day, see #149 » HTTP addressing problem ff..

To tell my story short: As the File object lacked this information (the full resource address) which is apparently required to be propagated correctly through the internal machinery including the file type prefix, we didn't find another way than adding an explicit --http_upload_url parameter which can be accessed from everywhere (good), but obviously would be bad for the health of the code base.

The more clean approach we see is to extend the file type dispatching machinery of file.cc, as outlined below.

Any thoughts?

amotl avatar Nov 13 '18 23:11 amotl

Remark: If appropriate, we might split the following discussion into yet another issue?

Improve the file type dispatcher

Like mentioned in #149 already, we are thinking about whether it would be good to evolve the function signature of the output writer adapter factory call to make the internals of File::CreateInternalFile read like

// As of Dec 2018, the _full_ resource address will _also_ get propagated to output writers
const FileTypeInfo* file_type = GetFileTypeInfo(file_name, &real_file_name);
return file_type->factory_function(file_name, real_file_name.data(), mode);

By doing so, we would do all the output writers a great favor to a) make them able to reason about the effective resource address and b) enable all consumers to also propagate the full address around the internal machinery. Otherwise, we don't see any other way than ugly hacks like already explored with 9150fd8c, which we all dislike.

These are just our two cents, we still hope for you coming up with an easy solution ;]. Thanks again for listening!

amotl avatar Nov 13 '18 23:11 amotl

image

To pick up

It looks like it is working even without any further changes required. According to the implementation, HttpFile::Tell never gets called at all. [...] Are we missing something?

again: Even if HttpFile::Tell never will get called, Shaka Packager produces reasonable results, see Appendix.

The bandwidth estimator will apparently be fed by the packed audio writer appropriately enough to "make things work"? Or where does it get its information from?


Appendix: Master playlist file showing reasonable bandwidth values

http http://localhost:6767/hls-live/bigbuckbunny.m3u8

HTTP/1.1 200 OK
Content-Length: 981
Content-Type: application/x-mpegurl
Server: Caddy

#EXTM3U
## Generated with https://github.com/google/shaka-packager version a5dd39f4-release

#EXT-X-MEDIA:TYPE=AUDIO,URI="bigbuckbunny-audio.m3u8",GROUP-ID="audio",NAME="stream_4",AUTOSELECT=YES,CHANNELS="2"

#EXT-X-STREAM-INF:BANDWIDTH=820123,AVERAGE-BANDWIDTH=628746,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=426x240,AUDIO="audio"
bigbuckbunny-video-240.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1728539,AVERAGE-BANDWIDTH=1162750,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=640x360,AUDIO="audio"
bigbuckbunny-video-360.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2695611,AVERAGE-BANDWIDTH=1791088,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=852x480,AUDIO="audio"
bigbuckbunny-video-480.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=3344587,AVERAGE-BANDWIDTH=2219143,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,AUDIO="audio"
bigbuckbunny-video-720.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=6727083,AVERAGE-BANDWIDTH=4224560,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1920x1080,AUDIO="audio"
bigbuckbunny-video-1080.m3u8

amotl avatar Nov 13 '18 23:11 amotl

so file.cc wouldn't even think about dispatching it to HttpFile again and will happily go to ThreadedIoFile instead: https://github.com/google/shaka-packager/blob/db3ed544f8c31ed4889d44a6e8d33b38a3d0d6a0/packager/file/threaded_io_file.cc#L152-L157

Wouldn't this just be completely okay in this case? If all I/O is handled by ThreadedIoFile under the hood, wouldn't this keep track of the position perfectly already?

This - at least - is the output we are getting when running 08e6b21 with --vmodule=ts_segmenter=1 and it didn't look bad to us:

[1114/001055:VERBOSE1:ts_segmenter.cc(197)] File size of http://192.168.178.101:6767/hls-live/bigbuckbunny-video-1080-0004.ts is 909732
[1114/001055:VERBOSE1:ts_segmenter.cc(197)] File size of http://192.168.178.101:6767/hls-live/bigbuckbunny-video-240-0004.ts is 106408
[1114/001055:VERBOSE1:ts_segmenter.cc(197)] File size of http://192.168.178.101:6767/hls-live/bigbuckbunny-video-360-0004.ts is 228984
[1114/001055:VERBOSE1:ts_segmenter.cc(197)] File size of http://192.168.178.101:6767/hls-live/bigbuckbunny-video-480-0004.ts is 372804
[1114/001055:VERBOSE1:ts_segmenter.cc(197)] File size of http://192.168.178.101:6767/hls-live/bigbuckbunny-video-720-0004.ts is 447816

amotl avatar Nov 14 '18 00:11 amotl

Re https://github.com/google/shaka-packager/issues/506#issuecomment-438482481, ahh, just realized that we create an internal ThreadedIoFile on top of every file to support file caching. We do not really need the caching offered by ThreadedIoFile as HttpFile already does caching internally; but it handles File::Tell for us without us to implement HttpFile::Tell in this case.

So yes, the behavior is correct.

We may want to disable ThreadedIo for HttpFile, but that can be done in a separate CL (and then we'll need to support HttpFile::Tell):

diff --git i/packager/file/file.cc w/packager/file/file.cc                                                                                        
index 732eced260..ab34421f1c 100644                                                                                                               
--- i/packager/file/file.cc
+++ w/packager/file/file.cc
@@ -152,8 +152,9 @@ File* File::Create(const char* file_name, const char* mode) {

   base::StringPiece file_type_prefix = GetFileTypePrefix(file_name);
   if (file_type_prefix == kMemoryFilePrefix ||
-      file_type_prefix == kCallbackFilePrefix) {
-    // Disable caching for memory and callback files.
+      file_type_prefix == kCallbackFilePrefix ||
+      file_type_prefix == kHttpFilePrefix) {
+    // Disable caching for memory, callback and http files.
     return internal_file.release();
   }

Re https://github.com/google/shaka-packager/issues/506#issuecomment-438472524

@amotl I don't really see it being a problem except logging.

Shaka Packager will say bad things like ... 192.168.178.101:6767/hls-live/bigbuckbunny-video-720-0083.ts

At this moment, it is already dispatched to HttpFile, i.e. current_file_ is an instance of HttpFile and HttpFile understands http.

We can certainly improve the logging for HttpFile and maybe other types of files to always use the non-stripped file name instead.

kqyang avatar Nov 14 '18 00:11 kqyang

@amotl Re https://github.com/google/shaka-packager/issues/506#issuecomment-438490209, good finding! Yes, see my https://github.com/google/shaka-packager/issues/506#issuecomment-438490589 posted at about the same times as yours :)

kqyang avatar Nov 14 '18 00:11 kqyang

Hi again.

Thanks for your comments and yes: If we would keep it like it is now, let's profit from ThreadedIoFile and close this, right? Personally, i would consider this solved. Thank you so much again for providing patches and thoughts!

Cheers, Andreas.

P.S.: Ah, maybe keep it open until this has also been implemented for WebVTT?

amotl avatar Nov 14 '18 00:11 amotl

SGTM. And yes, it needs to be implemented for WebVTT too. The solution for WebVTT can be similar, i.e. we can use File::Tell to get the file size. Would you like to work on this too?

kqyang avatar Nov 14 '18 00:11 kqyang

Would you like to work on this too?

Will be happy to have a look when coming back here in a while.

amotl avatar Nov 14 '18 02:11 amotl

Note that if --io_cache_size is set to 0, ThreadedIoFile won't be used. If we are going to make it a requirement, then it shouldn't be optional.

rkuroiwa avatar Nov 15 '18 21:11 rkuroiwa

Yes, @rkuroiwa is right. Thanks for pointing out!

kqyang avatar Nov 15 '18 21:11 kqyang

This was resolved at some point as the only remaining usage of File::GetFileSize is packager/file/file_unittest.cc

cosmin avatar Apr 25 '24 16:04 cosmin