ghdl copied to clipboard
Large array crashes ghdl
I've been running into an issue with GHDL using large arrays in vhdl (declared as type). To start I already saw previous issues about the same sort of subject: #342, #471 and #611. In those issues the problem is marked as solved. However I still have the same sort of issue, even with the latest master sources.
In my case I'm using a test setup combining cocotb and GHDL. Where cocotb creates the test stimuli and monitors. Don't know if this can make a difference, but for completeness I thought id mention it. I'm trying to simulate the following file:
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use IEEE.std_logic_unsigned.all;
entity cosim_test is
end cosim_test;
architecture rtl of cosim_test is
type ram_type is array(0 to (2**28)-1) of std_logic_vector(127 downto 0);
signal ram : ram_type := (others => (others => '0'));
end rtl;
Which for me causes the following output in the testbench:
loading VPI module '/home/docker/cocotb/build/libs/x86_64/'
-.--ns INFO cocotb.gpi gpi_embed.c:114 in embed_init_python Did not detect virtual environment. Using system-wide Python interpreter.
-.--ns INFO cocotb.gpi GpiCommon.cpp:91 in gpi_print_registered_impl VPI registered
VPI module loaded!
/home/docker/cocotb/makefiles/simulators/Makefile.ghdl:63: recipe for target 'results.xml' failed
make[4]: *** [results.xml] Error 255
/home/docker/cocotb/makefiles/Makefile.sim:71: recipe for target 'sim' failed
make[3]: *** [sim] Error 2
CMakeFiles/Simulation.dir/build.make:68: recipe for target 'run_simulation' failed
make[2]: *** [run_simulation] Error 2
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/Simulation.dir/all' failed
make[1]: *** [CMakeFiles/Simulation.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
When I decrease the size of the array I get to a point where it starts working again:
type ram_type is array(0 to (2**15)-1) of std_logic_vector(127 downto 0);
signal ram : ram_type := (others => (others => '0'));
However the simulation is very very slow! I already tried debugging with GDB, breaking at _exit
and __ghdl_fatal
. The breakpoint at __ghdl_fatal
does not work. The breakpoint at '_exit' does, however the output is not very usefull:
Breakpoint 1, __GI__exit (status=status@entry=-1) at ../sysdeps/unix/sysv/linux/_exit.c:27
27 ../sysdeps/unix/sysv/linux/_exit.c: No such file or directory.
(gdb) backtrace
#0 __GI__exit (status=status@entry=-1) at ../sysdeps/unix/sysv/linux/_exit.c:27
#1 0x00007f6dbfcbdfab in __run_exit_handlers (status=-1, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true) at exit.c:97
#2 0x00007f6dbfcbe045 in __GI_exit (status=<optimized out>) at exit.c:104
#3 0x00007f6dbfca4837 in __libc_start_main (main=0x406b72 <main>, argc=11, argv=0x7fffcfd60c28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fffcfd60c18) at ../csu/libc-start.c:325
#4 0x0000000000406029 in _start ()
At this point I'm kinda lost at what to do.
This might be related to #752.
@wsneijers, did you try increasing the size of the stack as suggested in
@umarcor Yes I did try increasing the stack size, it did not work though. I don't know if it is related. Could definitely be. That is using vunit however and not cocotb.
2**28 * 128 is a huge number of signals. Can you use a variable instead ?
@tgingold Good sugestion. Never thought of it, thanks! It does solve one problem, the speed. It is now as fast as normal. However it still breaks with 2^28 array size. Though it is a little more descriptive:
Starting program: /usr/local/bin/ghdl -r --std=08 --ieee=synopsys -O3 -Wno-binding -frelaxed-rules tb_shaping_dma_controller --vpi=/home/docker/cocotb/build/libs/x86_64/ --wave=../wave.ghw --ieee-asserts=disable
warning: Error disabling address space randomization: Operation not permitted
loading VPI module '/home/docker/cocotb/build/libs/x86_64/'
-.--ns INFO cocotb.gpi gpi_embed.c:114 in embed_init_python Did not detect virtual environment. Using system-wide Python interpreter.
-.--ns INFO cocotb.gpi GpiCommon.cpp:91 in gpi_print_registered_impl VPI registered
VPI module loaded!
./tb_shaping_dma_controller:error: NULL access dereferenced
./tb_shaping_dma_controller:error: error during elaboration
Breakpoint 1, __GI__exit (status=status@entry=1) at ../sysdeps/unix/sysv/linux/_exit.c:27
27 ../sysdeps/unix/sysv/linux/_exit.c: No such file or directory.
(gdb) bt
#0 __GI__exit (status=status@entry=1) at ../sysdeps/unix/sysv/linux/_exit.c:27
#1 0x00007f6bca28efab in __run_exit_handlers (status=1, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true) at exit.c:97
#2 0x00007f6bca28f045 in __GI_exit (status=<optimized out>) at exit.c:104
#3 0x00007f6bca275837 in __libc_start_main (main=0x406b72 <main>, argc=11, argv=0x7ffca88eeac8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca88eeab8)
at ../csu/libc-start.c:325
#4 0x0000000000406029 in _start ()
It does now mention a NULL access deference. But the __ghdl_fatal
breakpoint still does not work.
If you think about it: 2**28 * 128 equals 32 Gbit. I'am not sure about the internal representation of std_logic, but with a efficient encoding it needs at least 4 bit. If it's implemented as a char (8bit), you'll need 32 GB of memory just for the data storage not including any overhead.
I am having the same problem with 2d array defined as:
subtype CCD_Width_Range is natural range 0 to 2752 - 1;
subtype CCD_Height_Range is natural range 0 to 2002 - 1;
subtype CCD_Pixel_Data_T is std_logic_vector(12 - 1 downto 0);
type CCD_Matrix_T is array (CCD_Height_Range, CCD_Width_Range) of CCD_Pixel_Data_T;
which should amount to cca. 65 MB. Problem is, as was said, the array is being allocated on stack. In future would it be possible to detect big allocations and allocate them on heap instead? I should have time in the next few months to be able to at least write code to detect and report this condition instead of getting segmentation fault, if that would be desirable.
However, what's puzzling is, that I tried rising the stack size limit and it doesn't immediately crash, but it tries to allocate more than 16GB of memory (memory requirement does not seem to scale linearly?).
@tgingold Are variables allocated on heap or are they just faster overall?
Can you post a testcase ? The initial issue was with signals which are never allocated on the stack.
Yes, this one is even more interesting. It runs with 8MB stack limit but still it tries to allocate all of my RAM :). Running LLVM flavour, but same thing should happen under mcode. I know the code has problems, the point is, the memory usage is high even when instantiating entity with all array elements set to all '0'.
library ieee;
use ieee.std_logic_1164.all;
package test_pkg is
subtype CCD_Width_Range is natural range 0 to 2752 - 1;
subtype CCD_Height_Range is natural range 0 to 2002 - 1;
subtype CCD_Pixel_Data_T is std_logic_vector(12 - 1 downto 0);
type CCD_Matrix_T is array (CCD_Height_Range, CCD_Width_Range) of CCD_Pixel_Data_T;
end package test_pkg;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.test_pkg.CCD_Matrix_T;
entity array_test is
clkIn, rstAsyncIn : in std_logic;
ccdArray : in CCD_Matrix_T
end entity array_test;
architecture RTL of array_test is
ctrlProc : process(clkIn, rstAsyncIn)
variable currWidth, currHeight : natural := 0;
if rstAsyncIn = '1' then
currWidth := 0;
currHeight := 0;
elsif rising_edge(clkIn) then
report "Current pixel: " & to_hstring(ccdArray(currHeight, currWidth));
if currWidth >= ccdArray'high(2) then
currWidth := 0;
currHeight := currHeight + 1;
currWidth := currWidth + 1;
end if;
end if;
end process ctrlProc;
end architecture RTL;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.test_pkg.all;
entity tb_array_test is
end tb_array_test;
architecture tb of tb_array_test is
signal clkIn : std_logic;
signal rstAsyncIn : std_logic;
signal ccdArray : ccd_matrix_t;
constant TbPeriod : time := 1000 ns;
signal TbClock : std_logic := '0';
signal TbSimEnded : std_logic := '0';
dut : entity work.array_test
port map(
clkIn => clkIn,
rstAsyncIn => rstAsyncIn,
ccdArray => (others => (others => X"000"))
TbClock <= not TbClock after TbPeriod / 2 when TbSimEnded /= '1' else '0';
clkIn <= TbClock;
stimuli : process
TbSimEnded <= '1';
end process;
end tb;
I can reproduce the memory issue.
Note that you are using ~60E6 signals. A signal needs at least 144bytes, so you need at least 9GB for the signals. Then you also need to add drivers...
You should use variables instead of signals when possible.
Where's the ~60E6 coming from? I see that my model needs: 2752 * 2002 * 144B ~= 760MB * 3 (driver, TB, model) ~= 2300MB
Where did I err in my calculations?
Otherwise thanks, I will rewrite it to use variables :+1: .
The CCD_Matrix_T is an array of 2752*2002 vectors of 12 std_logic. You forgot the 12.
Ah, thanks again. It makes sense now.