shaka-packager
shaka-packager copied to clipboard
Do not use File::GetFileSize to get segment size
File::GetFileSize opens the file in read mode to get the file size. This could be a problem for non-local files, e.g. the HttpFile introduced in #149. In fact, it may not work for local files either due to file caching.
Right now it is used in MPEG2TS and WebVTT output writers: https://github.com/google/shaka-packager/blob/51d39d96a1d5e6926334122c09910932f32313fa/packager/media/formats/mp2t/ts_segmenter.cc#L189, https://github.com/google/shaka-packager/blob/f49b89280cc9b2024198640db87ea2ab79700d38/packager/media/formats/webvtt/webvtt_text_output_handler.cc#L112. MP4, WebM, PackedAudio output writers are not using File::GetFileSize.
MPEG2TS code and WebVTT code should be cleaned up to follow a similar approach as MP4 / WebM / PackedAudio.
Two proposals:
- Store and accumulate the sizes in memory in relevant classes
- Alternatively, use File::Tell to get the current file position, which should be supported for all files that supports writing.
A quick patch for TS is:
diff --git i/packager/media/formats/mp2t/ts_segmenter.cc w/packager/media/formats/mp2t/ts_segmenter.cc
index e9d54b39a1..be2a48167b 100644
--- i/packager/media/formats/mp2t/ts_segmenter.cc
+++ w/packager/media/formats/mp2t/ts_segmenter.cc
@@ -181,13 +181,18 @@ Status TsSegmenter::FinalizeSegment(uint64_t start_timestamp,
// This method may be called from Finalize() so ts_writer_file_opened_ could
// be false.
if (ts_writer_file_opened_) {
+ base::Optional<uint64_t> file_position = ts_writer_->GetFilePosition();
+ if (!file_position) {
+ return Status(
+ error::MUXER_FAILURE,
+ "Failed to get file position in TsSegmenter::FinalizeSegment.");
+ }
if (!ts_writer_->FinalizeSegment()) {
return Status(error::MUXER_FAILURE, "Failed to finalize TsWriter.");
}
if (listener_) {
- const int64_t file_size =
- File::GetFileSize(current_segment_path_.c_str());
+ const int64_t file_size = file_position.value();
listener_->OnNewSegment(current_segment_path_,
start_timestamp * timescale_scale_ +
Dear Kongqun,
we just added your patch for TS as 08e6b21e and it seems to work just fine, thank you so much!
You also asked in #149:
[The patch will] use File::Tell to get the file position. Can you update HttpFile class to support Tell?
It looks like it is working even without any further changes required. According to the implementation, HttpFile::Tell
never gets called at all. Did you mean to add it for other output writers like WebVTT? Are we missing something?
With kind regards, Andreas.
@amotl It does not seem to be possible based on the code as TsWriter::GetFilePosition() calls Tell():
base::Optional<uint64_t> TsWriter::GetFilePosition() {
if (!current_file_)
return base::nullopt;
uint64_t position;
return current_file_->Tell(&position) ? base::make_optional(position)
: base::nullopt;
}
Can you double check?
Can you double check?
Sure, i'm curious already what i might be missing again.
Observations
So, when running with additional logging like
base::Optional<uint64_t> TsWriter::GetFilePosition() {
if (!current_file_)
return base::nullopt;
VLOG(1) << "TsWriter::GetFilePosition for " << current_file_->file_name();
uint64_t position;
return current_file_->Tell(&position) ? base::make_optional(position)
: base::nullopt;
}
Shaka Packager will tell us things like
TsWriter::GetFilePosition for 192.168.178.101:6767/hls-live/bigbuckbunny-video-720-0083.ts
As you will recognize, the protocol prefix http://
is missing here, so file.cc
wouldn't even think about dispatching it to HttpFile
again and will happily go to ThreadedIoFile
instead:
https://github.com/google/shaka-packager/blob/db3ed544f8c31ed4889d44a6e8d33b38a3d0d6a0/packager/file/threaded_io_file.cc#L152-L157
Reflections
I knew this would be hitting us again ;], it already dragged me away hunting ghosts the other day, see #149 » HTTP addressing problem ff..
To tell my story short: As the File
object lacked this information (the full resource address) which is apparently required to be propagated correctly through the internal machinery including the file type prefix, we didn't find another way than adding an explicit --http_upload_url
parameter which can be accessed from everywhere (good), but obviously would be bad for the health of the code base.
The more clean approach we see is to extend the file type dispatching machinery of file.cc
, as outlined below.
Any thoughts?
Remark: If appropriate, we might split the following discussion into yet another issue?
Improve the file type dispatcher
Like mentioned in #149 already, we are thinking about whether it would be good to evolve the function signature of the output writer adapter factory call to make the internals of File::CreateInternalFile
read like
// As of Dec 2018, the _full_ resource address will _also_ get propagated to output writers
const FileTypeInfo* file_type = GetFileTypeInfo(file_name, &real_file_name);
return file_type->factory_function(file_name, real_file_name.data(), mode);
By doing so, we would do all the output writers a great favor to a) make them able to reason about the effective resource address and b) enable all consumers to also propagate the full address around the internal machinery. Otherwise, we don't see any other way than ugly hacks like already explored with 9150fd8c, which we all dislike.
These are just our two cents, we still hope for you coming up with an easy solution ;]. Thanks again for listening!
To pick up
It looks like it is working even without any further changes required. According to the implementation,
HttpFile::Tell
never gets called at all. [...] Are we missing something?
again: Even if HttpFile::Tell
never will get called, Shaka Packager produces reasonable results, see Appendix.
The bandwidth estimator will apparently be fed by the packed audio writer appropriately enough to "make things work"? Or where does it get its information from?
Appendix: Master playlist file showing reasonable bandwidth values
http http://localhost:6767/hls-live/bigbuckbunny.m3u8
HTTP/1.1 200 OK
Content-Length: 981
Content-Type: application/x-mpegurl
Server: Caddy
#EXTM3U
## Generated with https://github.com/google/shaka-packager version a5dd39f4-release
#EXT-X-MEDIA:TYPE=AUDIO,URI="bigbuckbunny-audio.m3u8",GROUP-ID="audio",NAME="stream_4",AUTOSELECT=YES,CHANNELS="2"
#EXT-X-STREAM-INF:BANDWIDTH=820123,AVERAGE-BANDWIDTH=628746,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=426x240,AUDIO="audio"
bigbuckbunny-video-240.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1728539,AVERAGE-BANDWIDTH=1162750,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=640x360,AUDIO="audio"
bigbuckbunny-video-360.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2695611,AVERAGE-BANDWIDTH=1791088,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=852x480,AUDIO="audio"
bigbuckbunny-video-480.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=3344587,AVERAGE-BANDWIDTH=2219143,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,AUDIO="audio"
bigbuckbunny-video-720.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=6727083,AVERAGE-BANDWIDTH=4224560,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1920x1080,AUDIO="audio"
bigbuckbunny-video-1080.m3u8
so
file.cc
wouldn't even think about dispatching it toHttpFile
again and will happily go toThreadedIoFile
instead: https://github.com/google/shaka-packager/blob/db3ed544f8c31ed4889d44a6e8d33b38a3d0d6a0/packager/file/threaded_io_file.cc#L152-L157
Wouldn't this just be completely okay in this case? If all I/O is handled by ThreadedIoFile
under the hood, wouldn't this keep track of the position perfectly already?
This - at least - is the output we are getting when running 08e6b21 with --vmodule=ts_segmenter=1
and it didn't look bad to us:
[1114/001055:VERBOSE1:ts_segmenter.cc(197)] File size of http://192.168.178.101:6767/hls-live/bigbuckbunny-video-1080-0004.ts is 909732
[1114/001055:VERBOSE1:ts_segmenter.cc(197)] File size of http://192.168.178.101:6767/hls-live/bigbuckbunny-video-240-0004.ts is 106408
[1114/001055:VERBOSE1:ts_segmenter.cc(197)] File size of http://192.168.178.101:6767/hls-live/bigbuckbunny-video-360-0004.ts is 228984
[1114/001055:VERBOSE1:ts_segmenter.cc(197)] File size of http://192.168.178.101:6767/hls-live/bigbuckbunny-video-480-0004.ts is 372804
[1114/001055:VERBOSE1:ts_segmenter.cc(197)] File size of http://192.168.178.101:6767/hls-live/bigbuckbunny-video-720-0004.ts is 447816
Re https://github.com/google/shaka-packager/issues/506#issuecomment-438482481, ahh, just realized that we create an internal ThreadedIoFile on top of every file to support file caching. We do not really need the caching offered by ThreadedIoFile as HttpFile already does caching internally; but it handles File::Tell for us without us to implement HttpFile::Tell in this case.
So yes, the behavior is correct.
We may want to disable ThreadedIo for HttpFile, but that can be done in a separate CL (and then we'll need to support HttpFile::Tell):
diff --git i/packager/file/file.cc w/packager/file/file.cc
index 732eced260..ab34421f1c 100644
--- i/packager/file/file.cc
+++ w/packager/file/file.cc
@@ -152,8 +152,9 @@ File* File::Create(const char* file_name, const char* mode) {
base::StringPiece file_type_prefix = GetFileTypePrefix(file_name);
if (file_type_prefix == kMemoryFilePrefix ||
- file_type_prefix == kCallbackFilePrefix) {
- // Disable caching for memory and callback files.
+ file_type_prefix == kCallbackFilePrefix ||
+ file_type_prefix == kHttpFilePrefix) {
+ // Disable caching for memory, callback and http files.
return internal_file.release();
}
Re https://github.com/google/shaka-packager/issues/506#issuecomment-438472524
@amotl I don't really see it being a problem except logging.
Shaka Packager will say bad things like ... 192.168.178.101:6767/hls-live/bigbuckbunny-video-720-0083.ts
At this moment, it is already dispatched to HttpFile, i.e. current_file_
is an instance of HttpFile and HttpFile understands http.
We can certainly improve the logging for HttpFile and maybe other types of files to always use the non-stripped file name instead.
@amotl Re https://github.com/google/shaka-packager/issues/506#issuecomment-438490209, good finding! Yes, see my https://github.com/google/shaka-packager/issues/506#issuecomment-438490589 posted at about the same times as yours :)
Hi again.
Thanks for your comments and yes: If we would keep it like it is now, let's profit from ThreadedIoFile
and close this, right? Personally, i would consider this solved. Thank you so much again for providing patches and thoughts!
Cheers, Andreas.
P.S.: Ah, maybe keep it open until this has also been implemented for WebVTT?
SGTM. And yes, it needs to be implemented for WebVTT too. The solution for WebVTT can be similar, i.e. we can use File::Tell to get the file size. Would you like to work on this too?
Would you like to work on this too?
Will be happy to have a look when coming back here in a while.
Note that if --io_cache_size
is set to 0, ThreadedIoFile
won't be used. If we are going to make it a requirement, then it shouldn't be optional.
Yes, @rkuroiwa is right. Thanks for pointing out!
This was resolved at some point as the only remaining usage of File::GetFileSize
is packager/file/file_unittest.cc