THULAC icon indicating copy to clipboard operation
THULAC copied to clipboard

Buffer overflow occurred during training process

Open fCorleone opened this issue 6 years ago • 0 comments

When I try to run program train_c with the command line :

./train_c train_file outfile

The address sanitizer found a heap buffer overflow issue:

=================================================================
==11181==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f28a77a201c at pc 0x000000415d9d bp 0x7ffc313dd090 sp 0x7ffc313dd080
READ of size 4 at 0x7f28a77a201c thread T0
    #0 0x415d9c in thulac::NGramFeature::find_bases(int, int, int, int&, int&) include/cb_ngram_feature.h:248
    #1 0x415d9c in thulac::NGramFeature::put_values(int*, int) include/cb_ngram_feature.h:118
    #2 0x415d9c in thulac::TaggingDecoder::put_values() include/cb_tagging_decoder.h:387
    #3 0x42c888 in thulac::TaggingLearner::train(char const*, char const*, char const*, char const*) include/cb_tagging_learner.h:305
    #4 0x404239 in main src/train_c.cc:62
    #5 0x7f28aa28d82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #6 0x404c98 in _start (/home/mfc_fuzz/newprogram/THULAC/train_c+0x404c98)

0x7f28a77a201c is located 202780 bytes inside of 524288-byte region [0x7f28a7770800,0x7f28a77f0800)
freed by thread T0 here:
    #0 0x7f28aac67961 in realloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98961)
    #1 0x426a03 in thulac::DATMaker::shrink() include/dat.h:221
    #2 0x426a03 in thulac::TaggingLearner::train(char const*, char const*, char const*, char const*) include/cb_tagging_learner.h:205
    #3 0xfff8627bb59  (<unknown module>)

previously allocated by thread T0 here:
    #0 0x7f28aac67961 in realloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98961)
    #1 0x43261e in thulac::DATMaker::extends() include/dat.h:207
    #2 0x43261e in thulac::DATMaker::alloc(std::vector<int, std::allocator<int> >&) include/dat.h:235
    #3 0x43261e in thulac::DATMaker::assign(int, std::vector<int, std::allocator<int> >&, int) include/dat.h:270

SUMMARY: AddressSanitizer: heap-use-after-free include/cb_ngram_feature.h:248 thulac::NGramFeature::find_bases(int, int, int, int&, int&)
Shadow bytes around the buggy address:
  0x0fe594eec3b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fe594eec3c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fe594eec3d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fe594eec3e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fe594eec3f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0fe594eec400: fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fe594eec410: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fe594eec420: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fe594eec430: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fe594eec440: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fe594eec450: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
==11181==ABORTING

The input file I tried to give to the program train_c contains only

我/r 爱/vm 北京/ns 天安门/ns

as you suggested in your document.

fCorleone avatar Jul 18 '18 02:07 fCorleone