HDL-deflate icon indicating copy to clipboard operation
HDL-deflate copied to clipboard

please describe how to get started

Open maartenbrock opened this issue 6 years ago • 17 comments

Hello,

This looks like a very interesting project.

But, since I have never used MYHDL could you please describe how to get started with for example the Vivado project? I managed to install MYHDL in Python, but what then? Am I supposed to run one (or all?) of the python scripts before I can open the project in Vivado? It seems so, but it does a lot more. And after that I still don't have design_1.bd which the project references.

Also, I don't understand Verilog well, but am used using VHDL. Would that be possible?

Then you state HDL-deflate can be configured. But how should that be done? At conversion time or at implementation time?

Kind regards, Maarten

maartenbrock avatar Feb 01 '19 14:02 maartenbrock

Hi Maarten,

The first step is to execute:

python3 deflate.py

This will generate deflate.v which you should import in e.g. Vivado.

You can also just run make which will do the same and also execute the testbench with Icarus. Note that you should have MyHDL installed by executing pip install --user myhdl (which you did).

If you use Windows I advise installing WSL and Ubuntu: https://docs.microsoft.com/en-us/windows/wsl/install-win10

If you want to change parameters just edit deflate.py and rerun python3 deflate.py and Vivado will pick up the changed deflate.v.

You can mix VHDL and Verilog in Vivado projects, that's no issue. MyHDL can also generate VHDL, just change the last line in deflate.py and add hdl='VHDL'.

I have not tested the generated vhd file but added a deflate.v and deflate.vhd to the github repo, so one can get started without using any Python if the default configuration is OK.

design_1.bd is probably the Vivado board description, which should be generated if you select a board in Vivado. You need a specific .xdc file for your setup anyway, if you do not have an Arty-100, so creating a fresh Vivado project is also a good first step in most cases.

tomtor avatar Feb 02 '19 09:02 tomtor

Hello Tom,

Thanks.

I hadn't installed myhdl with --user, but that shouldn't be a problem, I guess. Instead I ran pip install as Administrator. This also shows I'm on Windows, but I have access to native Ubuntu machines and VirtualBox Ubuntu VM's as well. Indeed running make would be easier on Linux.

Thanks for describing where to do the configuration. Once I opened the file I had seen that as well, but it would be nice if this was described in the readme. I would not have guessed how to generate the VHDL though.

Your HDL-Deflate.xpr already has a board selected. But it normally does not auto-generate a design AFAIK. You must have clicked 'Create Block Design' in the project at some point. Are you saying there is nothing of interest in your design_1.bd? If so, then where is deflate.v inserted? I only see test_deflate_bench.v being used. Even test_deflate_bench.v doesn't seem to reference deflate.v. I'm wondering if I should just forget about the HDL-Deflate.xpr Vivado project and the test_deflate_bench.v test.

Now for the deflate module. I see it has some inputs and outputs, but it's not quite clear to me what they mean. The clk and reset are obvious. I assume i_data is where the bytes come in, but when is it valid and how fast can I feed it? And o_byte is where the compressed data comes out, but again when is it valid? There are two progress indicators, but what's the difference or their meaning even? What is the meaning of o_done? And I have no clue what to feed into i_waddr and i_raddr either.

I somehow had expected some streaming input and output, like e.g. an AXI4-Stream.

Would you care to explain some more? Or is there a standard I should have read that uses similar definitions?

Kind regards, Maarten

maartenbrock avatar Feb 02 '19 11:02 maartenbrock

@vanmierlo Yes, I think it's best to just forget about the HDL-Deflate.xpr Vivado project in the repository.

In a fresh Vivado project you can just add the deflate.v and test_deflate_bench.v as source files and an .xdc file if you want to see the LEDs blinking. That should synthesize in Vivado.

The best documentation is really in test_deflate.py. At the end is a section

@block
def test_deflate_bench(i_clk, o_led, led0_g, led1_b, led2_r):

which is the source for test_deflate_bench.v.

Near the top of test_deflate.py is a Python test bench which tests the streaming interfaces.

The deflate module CAN stream and could be wrapped in a (VIvado) AXI-stream interface, but that is left as an exercise. It shouldn't be very hard, but I didn't want to invest time in a Xilinx or Vivado specific solution.

I'll explain the ports:

i_mode can be set WRITE (see the top of deflate.py or end of this comment for the numeric values) and it will load i_data at address i_waddr into the (de)compress input buffer. This buffer can be circular and you can use streaming.

i_mode can be set to STARTD or STARTC for a single cycle to start the (de)compression. After that you should set i_mode to either IDLE (just wait for completion signalled by o_done) or optionally to READ (EDIT: contents of output buffer at address i_raddr will always be placed in o_byte regardless of i_mode setting) or WRITE (write data to (streaming) input buffer).

When the stream is completely (de)compressed o_done will be set to 1.

o_iprogress indicates the current position of the deflate input reader. You need that if you stream the input in a circular input buffer like the tests do. You should only write at addresses < o_iprogress to prevent overwriting unread data.

o_oprogress points always 1 byte past the last available output byte in the buffer. So you should not read from addresses >= o_oprogress. At the end of the (de)compression o_oprogress contains the size of the final result.

i_mode values:

IDLE = 0, WRITE = 1, READ = 2, STARTC= 3, STARTD = 4

tomtor avatar Feb 02 '19 12:02 tomtor

Thanks again,

So we cannot write and read at the same time, right? That's a pity.

Does o_iprogress start at 1? Or should I only write at addresses <= o_iprogress? The same for o_oprogress, can I really read at address o_oprogress? And will these progress indicators have only 9 bits, like i_raddr and i_waddr? Or am I supposed to use a modulo?

I wouldn't necessarily classify AXI-stream as a Xilinx specific solution, though I really don't know if anyone else natively supports it. It's defined by ARM and Xilinx has decided to adopt it. And I'm sure there are other streaming protocols out there like e.g. a Wishbone point-to-point interconnection. But I know of none as efficient as AXI-Stream.

I'll start by examining test_deflate.py and see what I can make of it.

maartenbrock avatar Feb 02 '19 14:02 maartenbrock

I have another question: make requires python3.6 and I happen to have 3.5 installed. Are you (or myhdl) using anything 3.6 specific? It didn't seem so after I modified it. So why not just use PYHTON=python3?

maartenbrock avatar Feb 02 '19 16:02 maartenbrock

@vanmierlo The Python testbench uses Python3.6 specific deflate options when testing with DYNAMIC=False. When you use DYNAMIC=True (default) then even Python2 will be fine.

Regarding your other questions: o_iprogress starts at 0. Edit: So you can overwrite the first input byte (with i_waddr = 0) when o_iprogress >= 1 while decompressing. When compressing o_iprogress >= 1 + CWINDOW.

The last valid output byte always has address o_oprogress - 1 and when o_done is True o_oprogress is set to the size of the result in bytes. These progress indicators have size LMAX, which defaults to 24 (16 megabytes).

EDIT: You can READ/WRITE at the same time. o_byte will always receive the contents of the output buffer at index i_raddr, independent of the i_mode. So setting i_mode to READ for reading is optional. Setting i_mode to WRITE allows concurrent read/write.

So there really should be no issue embedding the current implementation in the standard Vivado AXI-Stream wrapper. If you write this wrapper (or a Wishbone wrapper) then I would be happy to merge it in this repository. I might also write one in the future myself, but I have currently no concrete usecase for such a wrapper.

Finally, in general Vivado AXI implementations are not without issues: https://zipcpu.com/blog/2019/01/12/demoaxilite.html

Suddenly o_iprogress is related to reading? I thought it was about writing. And if o_iprogress starts at 0 it is not possible to write only at addresses < o_iprogress. Were you maybe trying to say: You should not write at address o_iprogress - 1 ?

And likewise: You should not read at address o_oprogress ?

tomtor avatar Feb 02 '19 16:02 tomtor

Would you consider to rename reset to resetn as it appears to be active-low?

maartenbrock avatar Feb 02 '19 17:02 maartenbrock

Would you consider to rename reset to resetn as it appears to be active-low?

Yes, I should or I could change the implementation to active-high which is best practice in FPGA land? Not sure, what is your preference?

tomtor avatar Feb 02 '19 17:02 tomtor

My preference is active-high, but active-low with renaming to resetn is okay as well. The current situation is confusing.

maartenbrock avatar Feb 04 '19 09:02 maartenbrock

I'm getting lost. You wrote:

o_iprogress indicates the current position of the deflate input reader. You need that if you stream the input in a circular input buffer like the tests do. You should only write at addresses < o_iprogress to prevent overwriting unread data.

Then I asked:

Does o_iprogress start at 1? Or should I only write at addresses <= o_iprogress?

And now you answer:

Regarding your other questions: o_iprogress starts at 0. So you can read the first output byte (with i_raddr = 0) when o_iprogress >= 1.

Suddenly o_iprogress is related to reading? I thought it was about writing. And if o_iprogress starts at 0 it is not possible to write only at addresses < o_iprogress. Were you maybe trying to say: You should not write at address o_iprogress - 1 ?

And likewise: You should not read at address o_oprogress ?

maartenbrock avatar Feb 04 '19 09:02 maartenbrock

The Vivado AXI article is about AXI-lite, not AXI-Stream. AXI-Stream is a much simpler protocol. Further, I've never used that core, but always wrote my own AXI-lite slave core which never gave any issues.

If setting i_mode to READ is optional, does that mean that you don't need to READ to release occupied bytes in the read buffer? So, if you don't read in time the buffer will just overflow, right? Not necessarily a problem, just asking for confirmation. It does mean that back-pressure on the reading side should be handled on the writing side.

maartenbrock avatar Feb 04 '19 09:02 maartenbrock

The generated VHDL deflate.vhd has a lot of bugs:

  • NEXT is a reserved VHDL keyword, and so is DISTANCE
  • static is a redefinition of STATIC in VHDL
  • lots of direct assignments to and reading from signals from within functions/procedures
  • type errors
  • there is no bool() function nor stdl

I don't know if it is the source or the myhdl converter, but this is totally useless, sorry.

maartenbrock avatar Feb 04 '19 13:02 maartenbrock

I'm getting lost. You wrote:

Sorry, I corrected the text in the original reply. It should be OK now.

Regarding the VHDL, that's a MyHDL issue, nothing I can do about that. I only use/test the verilog output which is fine. I could change the identifiers to prevent NEXT/DISTANCE/STATIC (and use e.g. HD_NEXT/HD_DISTANCE/...), but that makes no sense, because there remain still other VHDL issues?

If it is ONLY a matter of different identifiers then please submit a PR with identifiers which are fine. If it is broken beyond repair, you have no choice but to use the verilog.

Edit: I dropped the .vhd from the repo and changed reset to be active high.

tomtor avatar Feb 04 '19 16:02 tomtor

If setting i_mode to READ is optional, does that mean that you don't need to READ to release occupied bytes in the read buffer? So, if you don't read in time the buffer will just overflow, right? Not necessarily a problem, just asking for confirmation. It does mean that back-pressure on the reading side should be handled on the writing side.

Edit: i_raddr signals the (de)compressor which output data has been read. It will pause and not overwrite.

tomtor avatar Feb 04 '19 16:02 tomtor

It looks like the generated vhdl is beyond repair. Just for the fun of it, open deflate.vhd in Vivado and see all the red markers.

I have now declared a component in vhdl and included the verilog file in Vivado.

component deflate is
port (
	clk		: in std_logic;
	reset		: in std_logic;

	i_mode		: in std_logic_vector ( 2 downto 0 );
	i_waddr		: in std_logic_vector ( 8 downto 0 );
	i_data		: in std_logic_vector ( 7 downto 0 );
	o_iprogress	: out std_logic_vector ( 23 downto 0 );
	o_oprogress	: out std_logic_vector ( 23 downto 0 );
	i_raddr		: in std_logic_vector ( 8 downto 0 );
	o_byte		: out std_logic_vector ( 7 downto 0 );
	o_done		: out std_logic		
);
end component;

I had to make some changes to the generated deflate.v to get things working:

  • changed o_byte from reg to wire
  • assign o_byte directly from oram[(i_raddr & 511)] instead of through a register
  • initialize o_iprogress and o_oprogress to zero during reset

And then I still have to delay o_oprogress by two clocks to get things right.

It's all these out-of-sync progress indicators, addresses and data that makes this difficult to handle. This is all automatically fixed when using an AXI-Stream. And you could probably drop the output blockram as well.

maartenbrock avatar Feb 05 '19 14:02 maartenbrock

I think bramread() can be removed as it seems it is not used.

And the enumeration of states IDLE, WRITE, etc. should be initialized with range(5) instead of 6 now.

Also, can you explain why IBSIZE needs to be 16 * CWINDOW when FAST is used? Or could it still be set to 2 * CWINDOW?

Btw. I think FAST is not about the achievable clock speed. But is it about low latency or about high throughput (measured in clock ticks)?

maartenbrock avatar Feb 08 '19 10:02 maartenbrock

I fixed the range two days ago:

https://github.com/tomtor/HDL-deflate/commit/da8691e1a2d7d38e0c661db1f26a8c730b185c05

FAST is explained in the README.

The only reason for 16 * CWINDOW is to get a buffer (512 bytes) which is sufficiently big for a dynamic tree test in the testbench part that does not use streaming. You can set it to 2 * CWINDOW and it will only TEST non dynamic compression. It has no impact on normal usage.

Edit: CWINDOW is 32 when FAST = True and 256 when False so we need to calculate IBSIZE as is done.

Edit 2: What do you mean regarding bramread(): it sets orbyte!? It is not called in deflate.py but it IS needed and functional. It generates the verilog always block.

tomtor avatar Feb 10 '19 09:02 tomtor