ctcdecode icon indicating copy to clipboard operation
ctcdecode copied to clipboard

segmentation fault when using kenlm language model

Open rajeevbaalwan opened this issue 5 years ago • 1 comments

getting below error while running test.py when .arpa lm file of size 2.5GB is used

Loading the LM will be faster if you build a binary file. Reading /home/rajeev/Documents/agent-lm/agent_lm_44_updated.arpa ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100


0%| | 0/2499 [00:00<?, ?it/s] Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

And getting only this while running script with .binary of size 2.5GB kenlm language model file

0%| | 0/2499 [00:00<?, ?it/s] Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

And when using .binary file of 284 MB it takes 12 sec to decode a single batch which is huge 0%| | 1/2499 [00:12<8:35:51, 12.39s/it]

when i am running script without language model it runs perfectly

Any help on this why segmentation fault occurs and taking so long time in decoding ??

rajeevbaalwan avatar Jan 10 '20 12:01 rajeevbaalwan

I also met with segmentation fault issue..

2000ZRL avatar Dec 15 '20 11:12 2000ZRL