ghdl icon indicating copy to clipboard operation
ghdl copied to clipboard

Large array crashes ghdl

Open wsneijers opened this issue 5 years ago • 13 comments

Hello,

I've been running into an issue with GHDL using large arrays in vhdl (declared as type). To start I already saw previous issues about the same sort of subject: #342, #471 and #611. In those issues the problem is marked as solved. However I still have the same sort of issue, even with the latest master sources.

In my case I'm using a test setup combining cocotb and GHDL. Where cocotb creates the test stimuli and monitors. Don't know if this can make a difference, but for completeness I thought id mention it. I'm trying to simulate the following file:

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use IEEE.std_logic_unsigned.all;

entity cosim_test is
end cosim_test;

architecture rtl of cosim_test is

type ram_type is array(0 to (2**28)-1) of std_logic_vector(127 downto 0);
signal ram : ram_type := (others => (others => '0'));

begin

end rtl;

Which for me causes the following output in the testbench:

loading VPI module '/home/docker/cocotb/build/libs/x86_64/libvpi.so'
     -.--ns INFO     cocotb.gpi                                  gpi_embed.c:114  in embed_init_python               Did not detect virtual environment. Using system-wide Python interpreter.
     -.--ns INFO     cocotb.gpi                                GpiCommon.cpp:91   in gpi_print_registered_impl       VPI registered
VPI module loaded!
/home/docker/cocotb/makefiles/simulators/Makefile.ghdl:63: recipe for target 'results.xml' failed
make[4]: *** [results.xml] Error 255
/home/docker/cocotb/makefiles/Makefile.sim:71: recipe for target 'sim' failed
make[3]: *** [sim] Error 2
CMakeFiles/Simulation.dir/build.make:68: recipe for target 'run_simulation' failed
make[2]: *** [run_simulation] Error 2
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/Simulation.dir/all' failed
make[1]: *** [CMakeFiles/Simulation.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

When I decrease the size of the array I get to a point where it starts working again:

type ram_type is array(0 to (2**15)-1) of std_logic_vector(127 downto 0);
signal ram : ram_type := (others => (others => '0'));

However the simulation is very very slow! I already tried debugging with GDB, breaking at _exit and __ghdl_fatal. The breakpoint at __ghdl_fatal does not work. The breakpoint at '_exit' does, however the output is not very usefull:

Breakpoint 1, __GI__exit (status=status@entry=-1) at ../sysdeps/unix/sysv/linux/_exit.c:27
27	../sysdeps/unix/sysv/linux/_exit.c: No such file or directory.
(gdb) backtrace
#0  __GI__exit (status=status@entry=-1) at ../sysdeps/unix/sysv/linux/_exit.c:27
#1  0x00007f6dbfcbdfab in __run_exit_handlers (status=-1, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true) at exit.c:97
#2  0x00007f6dbfcbe045 in __GI_exit (status=<optimized out>) at exit.c:104
#3  0x00007f6dbfca4837 in __libc_start_main (main=0x406b72 <main>, argc=11, argv=0x7fffcfd60c28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffcfd60c18) at ../csu/libc-start.c:325
#4  0x0000000000406029 in _start ()

At this point I'm kinda lost at what to do.

wsneijers avatar May 02 '19 06:05 wsneijers

This might be related to #752.

@wsneijers, did you try increasing the size of the stack as suggested in https://github.com/ghdl/ghdl/issues/611#issuecomment-445444610?

umarcor avatar May 02 '19 14:05 umarcor

@umarcor Yes I did try increasing the stack size, it did not work though. I don't know if it is related. Could definitely be. That is using vunit however and not cocotb.

wsneijers avatar May 02 '19 15:05 wsneijers

2**28 * 128 is a huge number of signals. Can you use a variable instead ?

tgingold avatar May 02 '19 16:05 tgingold

@tgingold Good sugestion. Never thought of it, thanks! It does solve one problem, the speed. It is now as fast as normal. However it still breaks with 2^28 array size. Though it is a little more descriptive:

Starting program: /usr/local/bin/ghdl -r --std=08 --ieee=synopsys -O3 -Wno-binding -frelaxed-rules tb_shaping_dma_controller --vpi=/home/docker/cocotb/build/libs/x86_64/libvpi.so --wave=../wave.ghw --ieee-asserts=disable
warning: Error disabling address space randomization: Operation not permitted
loading VPI module '/home/docker/cocotb/build/libs/x86_64/libvpi.so'
     -.--ns INFO     cocotb.gpi                                  gpi_embed.c:114  in embed_init_python               Did not detect virtual environment. Using system-wide Python interpreter.
     -.--ns INFO     cocotb.gpi                                GpiCommon.cpp:91   in gpi_print_registered_impl       VPI registered
VPI module loaded!
./tb_shaping_dma_controller:error: NULL access dereferenced
./tb_shaping_dma_controller:error: error during elaboration

Breakpoint 1, __GI__exit (status=status@entry=1) at ../sysdeps/unix/sysv/linux/_exit.c:27
27	../sysdeps/unix/sysv/linux/_exit.c: No such file or directory.
(gdb) bt
#0  __GI__exit (status=status@entry=1) at ../sysdeps/unix/sysv/linux/_exit.c:27
#1  0x00007f6bca28efab in __run_exit_handlers (status=1, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true) at exit.c:97
#2  0x00007f6bca28f045 in __GI_exit (status=<optimized out>) at exit.c:104
#3  0x00007f6bca275837 in __libc_start_main (main=0x406b72 <main>, argc=11, argv=0x7ffca88eeac8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca88eeab8)
    at ../csu/libc-start.c:325
#4  0x0000000000406029 in _start ()
(gdb) 

It does now mention a NULL access deference. But the __ghdl_fatal breakpoint still does not work.

wsneijers avatar May 03 '19 06:05 wsneijers

If you think about it: 2**28 * 128 equals 32 Gbit. I'am not sure about the internal representation of std_logic, but with a efficient encoding it needs at least 4 bit. If it's implemented as a char (8bit), you'll need 32 GB of memory just for the data storage not including any overhead.

go2sh avatar May 03 '19 11:05 go2sh

I am having the same problem with 2d array defined as:

subtype CCD_Width_Range is natural range 0 to 2752 - 1;
subtype CCD_Height_Range is natural range 0 to 2002 - 1;
subtype CCD_Pixel_Data_T is std_logic_vector(12 - 1 downto 0);

type CCD_Matrix_T is array (CCD_Height_Range, CCD_Width_Range) of CCD_Pixel_Data_T;

which should amount to cca. 65 MB. Problem is, as was said, the array is being allocated on stack. In future would it be possible to detect big allocations and allocate them on heap instead? I should have time in the next few months to be able to at least write code to detect and report this condition instead of getting segmentation fault, if that would be desirable.

However, what's puzzling is, that I tried rising the stack size limit and it doesn't immediately crash, but it tries to allocate more than 16GB of memory (memory requirement does not seem to scale linearly?).

@tgingold Are variables allocated on heap or are they just faster overall?

lavovaLampa avatar May 13 '19 15:05 lavovaLampa

Can you post a testcase ? The initial issue was with signals which are never allocated on the stack.

tgingold avatar May 14 '19 04:05 tgingold

Yes, this one is even more interesting. It runs with 8MB stack limit but still it tries to allocate all of my RAM :). Running LLVM flavour, but same thing should happen under mcode. I know the code has problems, the point is, the memory usage is high even when instantiating entity with all array elements set to all '0'.

test_pkg.vhd

library ieee;
use ieee.std_logic_1164.all;

package test_pkg is
    subtype CCD_Width_Range is natural range 0 to 2752 - 1;
    subtype CCD_Height_Range is natural range 0 to 2002 - 1;
    subtype CCD_Pixel_Data_T is std_logic_vector(12 - 1 downto 0);
    type CCD_Matrix_T is array (CCD_Height_Range, CCD_Width_Range) of CCD_Pixel_Data_T;
end package test_pkg;

array_test.vhd

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.test_pkg.CCD_Matrix_T;

entity array_test is
    port(
        clkIn, rstAsyncIn : in std_logic;
        ccdArray          : in CCD_Matrix_T
    );
end entity array_test;

architecture RTL of array_test is
begin
    ctrlProc : process(clkIn, rstAsyncIn)
        variable currWidth, currHeight : natural := 0;
    begin
        if rstAsyncIn = '1' then
            currWidth  := 0;
            currHeight := 0;
        elsif rising_edge(clkIn) then
            report "Current pixel: " & to_hstring(ccdArray(currHeight, currWidth));
            if currWidth >= ccdArray'high(2) then
                currWidth  := 0;
                currHeight := currHeight + 1;
            else
                currWidth := currWidth + 1;
            end if;
        end if;
    end process ctrlProc;
end architecture RTL;

array_test_tb.vhd

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.test_pkg.all;

entity tb_array_test is
end tb_array_test;

architecture tb of tb_array_test is
    signal clkIn      : std_logic;
    signal rstAsyncIn : std_logic;
    signal ccdArray   : ccd_matrix_t;

    constant TbPeriod : time      := 1000 ns;
    signal TbClock    : std_logic := '0';
    signal TbSimEnded : std_logic := '0';

begin

    dut : entity work.array_test
        port map(
            clkIn      => clkIn,
            rstAsyncIn => rstAsyncIn,
            ccdArray   => (others => (others => X"000"))
        );

    TbClock <= not TbClock after TbPeriod / 2 when TbSimEnded /= '1' else '0';
    clkIn   <= TbClock;

    stimuli : process
    begin
        TbSimEnded <= '1';
        wait;
    end process;

end tb;

lavovaLampa avatar May 15 '19 19:05 lavovaLampa

I can reproduce the memory issue.

tgingold avatar May 16 '19 04:05 tgingold

Note that you are using ~60E6 signals. A signal needs at least 144bytes, so you need at least 9GB for the signals. Then you also need to add drivers...

You should use variables instead of signals when possible.

tgingold avatar May 16 '19 05:05 tgingold

Where's the ~60E6 coming from? I see that my model needs: 2752 * 2002 * 144B ~= 760MB * 3 (driver, TB, model) ~= 2300MB

Where did I err in my calculations?

Otherwise thanks, I will rewrite it to use variables :+1: .

lavovaLampa avatar May 18 '19 15:05 lavovaLampa

The CCD_Matrix_T is an array of 2752*2002 vectors of 12 std_logic. You forgot the 12.

tgingold avatar May 18 '19 15:05 tgingold

Ah, thanks again. It makes sense now.

lavovaLampa avatar May 18 '19 15:05 lavovaLampa