ccextractor icon indicating copy to clipboard operation
ccextractor copied to clipboard

ocr_bitmap can run out of buffer memory copying the "last font tag"

Open jstrot opened this issue 1 year ago • 2 comments

In raising this pull request, I confirm the following (please check boxes):

  • [x] I have read and understood the contributors guide.
  • [x] I have checked that another pull request for this purpose does not exist.
  • [x] I have considered, and confirmed that this submission will be valuable to others.
  • [x] I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • [x] I give this submission freely, and claim no ownership to its content.
  • [x] I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • [ ] I have never used CCExtractor.
  • [ ] I have used CCExtractor just a couple of times.
  • [x] I absolutely love CCExtractor, but have not contributed previously.
  • [ ] I am an active contributor to CCExtractor.

Version: 0.94

During OCR of a VOB PS, ccextractor can run out of buffer space if it has to copy all text since the last font tag (which can also be the beginning of the input):

$ ./ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt
...
Error: In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.

I believe the bug existed since that piece of code was introduced way back in 2017 (#844)

The fix simply makes sure the allocated buffer is big enough for this extra string.

Example crash under gdb:

$ gdb --args ./ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt
(gdb) run                     
Starting program: /home/jst/tools/src/ccextractor/linux/ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt  
[Thread debugging using libthread_db enabled]  
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".  
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.  
Teletext portions taken from Petr Kutalek's telxcc                                                               
--------------------------------------------------------------------------  
Input: test.vob                                                             
[Extract: 1] [Stream mode: Autodetect]                      
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]  
[Timing mode: Auto] [Debug: No] [Buffer input: No]                          
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]  
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]  
[Add font color data: Yes] [Add font typesetting: Yes]         
[Convert case: No][Filter profanity: No] [Video-edit join: No]  
[Extraction start time: not set (from start)]                        
[Extraction end time: not set (to end)]                              
[Live stream: No] [Clock frequency: 90000]              
[Teletext page: Autodetect]                                     
[Start credits text: None]                       
[Quantisation-mode: CCExtractor's internal function]  
                                                 
-----------------------------------------------------------------  
Opening file: test.vob                           
File seems to be a program stream, enabling PS mode   
Analyzing data in general mode                   
                                                                   
                                                 
New video information found                          
[720 * 480] [AR: 02 - 4:3] [FR: 04 - 29.97] [progressive: no]  
   
  0%  |  00:00                                   
...                          
Skip forward to the next Sequence or GOP start.  
 95%  |  19:38  
Skip forward to the next Sequence or GOP start.  
  
Skip forward to the next Sequence or GOP start.  
  
Thread 1 "ccextractor" hit Breakpoint 1, fatal (exit_code=1000, fmt=0x555555ee8da0 "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n") at ../src/lib_ccx/utility.c:272  
272             va_start(args, fmt);  
(gdb) up  
#1  0x00005555557976ed in ocr_bitmap (arg=0x602000008250, palette=0x602000b1c390, alpha=0x602000b1c3b0 "", indata=0x62a000726200 "", w=556, h=42, copy=0x60400003c210) at ../src/lib_ccx/ocr.c:638  
638                                                             fatal(CCX_COMMON_EXIT_BUG_BUG, "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n", errno);  
(gdb) list  
633                                             {  
634                                                     if ((new_text_out_iter - new_text_out) +  
635                                                             (last_font_tag_end - last_font_tag) >  
636                                                         length)  
637                                                     {  
638                                                             fatal(CCX_COMMON_EXIT_BUG_BUG, "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n", errno);  
639                                                     }  
640                                                     memcpy(new_text_out_iter, last_font_tag, last_font_tag_end - last_font_tag);  
641                                                     new_text_out_iter += last_font_tag_end - last_font_tag;  
642                                             }  
(gdb) p new_text_out_iter - new_text_out  
$1 = 96  
(gdb) p last_font_tag_end - last_font_tag  
$2 = 76  
(gdb) p length  
$3 = 158  
(gdb) p new_text_out_iter - new_text_out + last_font_tag_end - last_font_tag  
$4 = 172                                                                                                                                                                                                                                                                         

Before actually reaching this point I also had to fix an ASAN error with process_spu using memcpy on overlapping buffers. I can't say I understand why the buffers would be overlapping but using memmove at least fixes the error.

==611746==ERROR: AddressSanitizer: memcpy-param-overlap: memory ranges [0x7fffdf1eae84,0x7fffdf1eb528) and [0x7fffdf1ea800, 0x7fffdf1eaea4) overlap
    #0 0x7ffff786db25 in __interceptor_memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:899
    #1 0x5555556c2302 in process_spu ../src/lib_ccx/dvd_subtitle_decoder.c:387
    #2 0x5555556fe994 in process_data ../src/lib_ccx/general_loop.c:662
    #3 0x555555701650 in process_non_multiprogram_general_loop ../src/lib_ccx/general_loop.c:968
    #4 0x555555702248 in general_loop ../src/lib_ccx/general_loop.c:1062
    #5 0x5555556738ee in api_start ../src/ccextractor.c:204
    #6 0x555555675c39 in main ../src/ccextractor.c:465
    #7 0x7ffff64456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #8 0x7ffff6445784 in __libc_start_main_impl ../csu/libc-start.c:360
    #9 0x555555672c50 in _start (/home/jst/tools/src/ccextractor/linux/ccextractor+0x11ec50) (BuildId: 466667d3e95ff9aa8e7b1165aeac946dcfc18371)

jstrot avatar Jan 07 '24 02:01 jstrot

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 79aaf86...:

Report Name Tests Passed
Broken 0/13
CEA-708 0/14
DVB 0/7
DVD 0/3
DVR-MS 0/2
General 0/27
Hauppage 0/3
MP4 0/3
NoCC 0/10
Options 0/86
Teletext 0/21
WTV 0/13
XDS 0/34

All tests passing on the master branch were passed completely.

NOTE: The following tests have been failing on the master branch as well as the PR:


Check the result page for more info.

ccextractor-bot avatar Jan 07 '24 02:01 ccextractor-bot

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 280939d...:

Report Name Tests Passed
Broken 0/13
CEA-708 0/14
DVB 0/7
DVD 0/3
DVR-MS 0/2
General 0/27
Hauppage 0/3
MP4 0/3
NoCC 0/10
Options 0/86
Teletext 0/21
WTV 0/13
XDS 0/34

All tests passing on the master branch were passed completely.

NOTE: The following tests have been failing on the master branch as well as the PR:


Check the result page for more info.

ccextractor-bot avatar Jan 07 '24 03:01 ccextractor-bot