fast-ctc-decode icon indicating copy to clipboard operation
fast-ctc-decode copied to clipboard

beam search call result in Killed

Open edwardpwtsoi opened this issue 3 years ago • 2 comments

Background

There is a single sample which caused a Killed situation when I was using the library with docker. The base image I used is python:3.8. I couldn't find the cause of it. Would you mind have a look and help? Thank you

The data This the posterior array saved as json after calling the tolist method

A Docker Container with the posterior.json and dependencies installed

docker pull powatsoi/fast-ctc-decode-issue:latest

Steps to reproduce

docker run -it --rm powatsoi/fast-ctc-decode-issue:latest

root@9ce0d0f66ba2:/# ipython
Python 3.8.8 (default, Mar 27 2021, 18:26:41) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.25.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import json

In [2]: import numpy as np

In [3]: from fast_ctc_decode import beam_search

In [4]: beam_search(np.asarray(json.load(open("posterior.json")), dtype=np.float32), "a" * 7997)
Killed

pip list output:

root@9ce0d0f66ba2:/# pip list
Package           Version
----------------- -------
backcall          0.2.0
decorator         5.0.9
fast-ctc-decode   0.3.0
ipython           7.25.0
ipython-genutils  0.2.0
jedi              0.18.0
matplotlib-inline 0.1.2
numpy             1.21.1
parso             0.8.2
pexpect           4.8.0
pickleshare       0.7.5
pip               21.0.1
prompt-toolkit    3.0.19
ptyprocess        0.7.0
Pygments          2.9.0
setuptools        54.2.0
traitlets         5.0.5
wcwidth           0.2.5
wheel             0.36.2

edwardpwtsoi avatar Jul 23 '21 07:07 edwardpwtsoi

@edwardpwtsoi you are running out of memory and the OOM killer is kicking in. I can successfully decode your posterior.json on a machine with enough system memory.

Try increasing beam_cut_threshold:

$ # beam_cut_threshold=0
/usr/bin/time -v python test.py 
        Command being timed: "python test.py"
        User time (seconds): 8.06
        System time (seconds): 19.55
        Percent of CPU this job got: 123%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:22.43
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 42,787,016
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 10706916
        Voluntary context switches: 148
        Involuntary context switches: 2336
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
$
$ # beam_cut_threshold=0.000125
$ /usr/bin/time -v python test.py 
        Command being timed: "python test.py"
        User time (seconds): 3.36
        System time (seconds): 4.11
        Percent of CPU this job got: 388%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.92
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2,734,556
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 693781
        Voluntary context switches: 137
        Involuntary context switches: 679
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

iiSeymour avatar Jul 23 '21 10:07 iiSeymour

Thank you for your reply, @iiSeymour. I could decode it when I am not using docker as well. And I had set the --oom-kill-disable while running the docker container, but it still doesn't work out. May I know did the above result run with docker?

edwardpwtsoi avatar Jul 26 '21 01:07 edwardpwtsoi