THULAC
THULAC copied to clipboard
Buffer overflow occurred during training process
When I try to run program train_c with the command line :
./train_c train_file outfile
The address sanitizer found a heap buffer overflow issue:
=================================================================
==11181==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f28a77a201c at pc 0x000000415d9d bp 0x7ffc313dd090 sp 0x7ffc313dd080
READ of size 4 at 0x7f28a77a201c thread T0
#0 0x415d9c in thulac::NGramFeature::find_bases(int, int, int, int&, int&) include/cb_ngram_feature.h:248
#1 0x415d9c in thulac::NGramFeature::put_values(int*, int) include/cb_ngram_feature.h:118
#2 0x415d9c in thulac::TaggingDecoder::put_values() include/cb_tagging_decoder.h:387
#3 0x42c888 in thulac::TaggingLearner::train(char const*, char const*, char const*, char const*) include/cb_tagging_learner.h:305
#4 0x404239 in main src/train_c.cc:62
#5 0x7f28aa28d82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#6 0x404c98 in _start (/home/mfc_fuzz/newprogram/THULAC/train_c+0x404c98)
0x7f28a77a201c is located 202780 bytes inside of 524288-byte region [0x7f28a7770800,0x7f28a77f0800)
freed by thread T0 here:
#0 0x7f28aac67961 in realloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98961)
#1 0x426a03 in thulac::DATMaker::shrink() include/dat.h:221
#2 0x426a03 in thulac::TaggingLearner::train(char const*, char const*, char const*, char const*) include/cb_tagging_learner.h:205
#3 0xfff8627bb59 (<unknown module>)
previously allocated by thread T0 here:
#0 0x7f28aac67961 in realloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98961)
#1 0x43261e in thulac::DATMaker::extends() include/dat.h:207
#2 0x43261e in thulac::DATMaker::alloc(std::vector<int, std::allocator<int> >&) include/dat.h:235
#3 0x43261e in thulac::DATMaker::assign(int, std::vector<int, std::allocator<int> >&, int) include/dat.h:270
SUMMARY: AddressSanitizer: heap-use-after-free include/cb_ngram_feature.h:248 thulac::NGramFeature::find_bases(int, int, int, int&, int&)
Shadow bytes around the buggy address:
0x0fe594eec3b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0fe594eec3c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0fe594eec3d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0fe594eec3e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0fe594eec3f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0fe594eec400: fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd
0x0fe594eec410: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0fe594eec420: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0fe594eec430: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0fe594eec440: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0fe594eec450: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
==11181==ABORTING
The input file I tried to give to the program train_c contains only
我/r 爱/vm 北京/ns 天安门/ns
as you suggested in your document.