neorv32 icon indicating copy to clipboard operation
neorv32 copied to clipboard

USB-UART bridge/peripheral for boards without a uC or an external bridge

Open umarcor opened this issue 3 years ago • 31 comments

We have discussed in several issues about adding a USB-UART peripheral to NEORV32 (either internal o external), which would allow to communicate the NEORV32 with a host laptop/workstation when using boards such as Fomu, TinyFPGA-BX or OrangeCrab. Those boards do have a USB port connected to the FPGA I/O without any intermediate uC, FTDI, CH340... or similar device. Let's use this issue for tracking the experiments in this regard.


https://github.com/im-tomu/fomu-workshop/issues/421#issuecomment-758313567

Have a look at greatscottgadgets/luna#80. Any other design would also be acceptable. However, dealing with USB in HDL is non-trivial, and meeting timing requirements on the Fomu (UP5K) makes it really challenging.


https://github.com/stnolting/neorv32/discussions/52#discussioncomment-817583

Back to NEORV32 and Fomu. I would like to be able to communicate the host (laptop/workstation) and the soft CPU through UART over USB. There is no ready to use solution in the workshop yet. However, some days ago, @smunaut suggested using no2fpga/no2muacm (see im-tomu/fomu-workshop#421). I could not try that yet. I don't know if a minimal NEORV32 + no2muacm would fit. Moreover, it seems that no2muacm uses a RISC-V core already. Hence, it might make sense to add a similar peripheral to NEORV32, rather than using no2muacm as is.

https://github.com/stnolting/neorv32/discussions/52#discussioncomment-828213

You should also look into the "bitsy" version of the icebreaker, a cheaper version without the FTDI that relies on USB bootloader and comms (like fomu) : https://github.com/icebreaker-fpga/icebreaker#icebreaker-bitsy

It you want to port NeoRV32 and add usb support to it using my no2usb core, I can send you a spare one. (They're not generally for sale yet, first batch is in progress).

https://github.com/stnolting/neorv32/discussions/52#discussioncomment-817617

(1) First don't hesistate to contact me ( like on 1bitsquared discord is probably the best ) for any help regarding no2muacm. (2) Indeed in the case of NeoRV32 it's probably better to add the no2usb directly connected to it. However if I trust the first page readme ... the fpga is full to the brim (97%), no space whatsoever to add anything there ... The no2muacm core is ~ 1100 LCs. The no2usb core alone is ~ 650 LCs.


https://github.com/stnolting/neorv32/pull/56#issuecomment-875035999

I have added a hardware USB to my UPduino board that is (electrically) identical to the FOMU. Now I am tinkering with https://github.com/davidthings/tinyfpga_bx_usbserial in loop-back mode: so I have a simple USB-UART echo on the board. Right now I am using Lattice Radiant but the timing still fails...

Anyway. I am curious, has anyone here ever worked with that bridge? Maybe even on a real FOMU?

https://github.com/stnolting/neorv32/pull/56#issuecomment-875106902

I've got some tests in this branch: https://github.com/juanmard/neorv32/tree/fomu-serial

Four tests in four differents commits. https://github.com/juanmard/neorv32/commits/fomu-serial

I was trying redirect UART0 to this bridge but don't work for now.

umarcor avatar Jul 07 '21 13:07 umarcor

So we have two options right now to support a FPGA-USB interface:

  1. https://github.com/no2fpga/no2muacm
  2. https://github.com/davidthings/tinyfpga_bx_usbserial

I would prefer porting option 1 as a new peripheral for the processor since this module seems to be quite powerful. I am still looking through the code and I do not understand all of it yet :smile:

Anyway, option 2 is a great option to be used as processor-external component. We could disable the processor's default UART0, add usbserial to the bus interface and "bend" all UART read/write functions to that. From my point of view, this should be the focus for now.


Regarding option 2 and @juanmard `s https://github.com/stnolting/neorv32/pull/56#issuecomment-875106902:

I am already using a "pseudo-UART" for a different project: it is an IP block (basically, a UART-to-JTAG bridge) that provides a FIFO interface for sending/receiving UART data just like usbserial. I have connected that to a NEORV32 stream link and I have overridden the default neorv32_uart_... C functions to make use of the stream link instead. We could do the same for the FOMU-based usbserial setup. The only things required here (in terms of hardware) are two clock-domain crossing FIFOs between the usbserial ports (running at 48MHz) and the processor (running at something < 48MHz).

stnolting avatar Jul 07 '21 15:07 stnolting

@juanmard

I am working through this setup right now https://github.com/juanmard/neorv32/blob/fomu-serial/setups/examples/neorv32_Fomu_BoardTop_MixedLanguage.vhd trying to port that to my "Frankestein UPduino USB" setup :smile:

Just some questions:

  • Have you tested that on the FOMU?
  • I see you are using RealTerm on Windows, right? Can Windows enumerate the device "out of the box"? And does it show up as serial interface?
  • The FOMU is using a dedicated 48MHz crystal for the USB engine, right? My UPduino does not have that. I am trying to use the internal HF oscillator and a PLL to generate a 48MHz clock. But the oscillator is not really stable, so I might have to use the external crystal (12MHz), too, since the USB front end might struggle with clocks that are not exactly 48MHz. Have you ever encountered any problems there?

stnolting avatar Jul 07 '21 15:07 stnolting

@stnolting, I think that tinyfpga_bx_usbserial might have some reliability issues, specially on Windows. So, "it can work", but I don't know if "it works". I believe that's why no2muacm and other alternatives exist. Do not forget about greatscottgadgets/luna#80, since the scope of that project is precisely research and hacking of USB devices/protocols. I agree that the most sensible procedure would be 1. usbserial, 2. no2muacm, 3. no2usb and 4. luna. However, do not get stuck in 1. Jump to 2. straightaway if you need it.

@juanmard had bittersweet results. However, he is not an experienced HDL or RTL developer, so some of his struggles were actually related to the usage of the stream and how to interact with the peripheral from the NEORV32. I think you just implemented what he was missing:

"I have connected that to a NEORV32 stream link and I have overridden the default neorv32_uart_... C functions to make use of the stream link instead. We could do the same for the FOMU-based usbserial setup. The only things required here (in terms of hardware) are two clock-domain crossing FIFOs between the usbserial ports (running at 48MHz) and the processor (running at something < 48MHz)".

It should now be relatively straightforward to merge your experiments.

I would prefer porting option 1 as a new peripheral for the processor since this module seems to be quite powerful

The main feature of tinyfpga_bx_usbserial is that the interfaces are USB on one side and Streams on the other. Therefore, I would try to use a similar entity when implementing the peripheral for NEORV32. That will make it easier to replace in the future. More precisely, it should be possible to maintain all the sources, and change between usbserial, no2muacm, no2usb, etc. by just selecting a different architecture. Coherently, it does not really matter if it's implemented as an internal or external peripheral. The point about specifying the entity is precisely to decouple development of the internals from the integration into the system.

umarcor avatar Jul 07 '21 15:07 umarcor

@stnolting see https://twitter.com/juanmard/status/1408276644343824407. So, @juanmard used the GPIO of the NEORV32 for driving a mux that selected one of two characters (constants, hardwired) to be continuously sent through the UART Stream. I think that he connected the external 48 MHz to the usbserial instance, and he used the PLL for the NEORV32.

umarcor avatar Jul 07 '21 15:07 umarcor

trying to port that to my "Frankestein UPduino USB" setup 😄

The FOMU is using a dedicated 48MHz crystal for the USB engine, right? My UPduino does not have that.

Did you check your mailbox today? :wink:

umarcor avatar Jul 07 '21 15:07 umarcor

@stnolting: Just some questions:

  • Have you tested that on the FOMU?

Yes, I tested all on the FOMU because no pinouts is available on it and usb-serial is more than necessary... 😄

  • I see you are using RealTerm on Windows, right? Can Windows enumerate the device "out of the box"? And does it show up as serial interface?

If you look at https://twitter.com/juanmard/status/1406323387803242499 You can see, in Window 10, that it present as "Dispositivo serie USB (COM9)" in Spanish ("USB serial device" in English), so I suppose the answer is "yes"... 😄

  • The FOMU is using a dedicated 48MHz crystal for the USB engine, right?

Right... 👍

My UPduino does not have that. I am trying to use the internal HF oscillator and a PLL to generate a 48MHz clock. But the oscillator is not really stable, so I might have to use the external crystal (12MHz), too, since the USB front end might struggle with clocks that are not exactly 48MHz. Have you ever encountered any problems there?

AFAIK.... No, no stability problem. The fundamental problems that I have encountered are synthesis problems. Using more LUTs available to the FOMU. I have had to find a balance between the minimum neorv32 system and the USB-Serial Bridge.

@umarcor: I think that he connected the external 48 MHz to the usbserial instance, and he used the PLL for the NEORV32.

Yes, that is. 👍

juanmard avatar Jul 07 '21 16:07 juanmard

@umarcor

As far as I understand, https://github.com/no2fpga/no2muacm is a hardware PHY and a RISC-V core for all the USB stack handling. This could be ported to the NEORV32, but that is not trivial. In contrast, usbserial is a stand-alone unit. I see that there might be some reliability problems with it but I think it is a good thing to start with: it is an insulated module, it is FPGA proven (and FOMU proven by @juanmard 🚀) and quite simple to integrate.

@stnolting see https://twitter.com/juanmard/status/1408276644343824407. So, @juanmard used the GPIO of the NEORV32 for driving a mux that selected one of two characters (constants, hardwired) to be continuously sent through the UART Stream. I think that he connected the external 48 MHz to the usbserial instance, and he used the PLL for the NEORV32.

I am using Lattice Radiant right now. And even when using Synplify Pro 48MHz is quite a task. With lots and lots of synthesis configuration parameter I am at 42MHz right now (problem: lots of logic duplication...).

Did you check your mailbox today? 😉

Sure, but there are no new messages from you. Please use stnolting[at]gmail.com. There might be different mail addresses in older commit logs, though - do not use them 😅


@juanmard

If you look at https://twitter.com/juanmard/status/1406323387803242499 You can see, in Window 10, that it present as "Dispositivo serie USB (COM9)" in Spanish ("USB serial device" in English), so I suppose the answer is "yes"... 😄

Great! Thank you!

AFAIK.... No, no stability problem. The fundamental problems that I have encountered are synthesis problems. Using more LUTs available to the FOMU. I have had to find a balance between the minimum neorv32 system and the USB-Serial Bridge.

I can't believe this is actually working 😄 According to https://github.com/juanmard/neorv32/runs/2911737213?check_suite_focus=true I cannot find any timing constraints except for the implicit one for the PLL clock. There is just a timing report for the clki clock, which also drives the USB logic:

Info: Max frequency for clock 'clki$SB_IO_IN_$glb_clk': 30.80 MHz (PASS at 12.00 MHz)

If this works, then my experiments with Radiant running at 40-something MHz should hopefully work, too (fingers crossed).

stnolting avatar Jul 07 '21 17:07 stnolting

As far as I understand, https://github.com/no2fpga/no2muacm is a hardware PHY and a RISC-V core for all the USB stack handling. This could be ported to the NEORV32, but that is not trivial. In contrast, usbserial is a stand-alone unit. I see that there might be some reliability problems with it but I think it is a good thing to start with: it is an insulated module, it is FPGA proven (and FOMU proven by @juanmard 🚀) and quite simple to integrate.

The point is that the PHY and the RISC-V core in no2muacm might be considered a single black-box. No need to know that both of those modules exist inside. It would be for prototyping purposes only. I believe that the port you are thinking about would be using the PHY only (no2usb) connected to NEORV32. I agree that's the most interesting end goal, but there is not need to start with that!

Anyway, let's see if the timing results with usbserial are good enough! As you said, Juanma had it almost done!

Sure, but there are no new messages from you.

The physical one :smile:

umarcor avatar Jul 07 '21 18:07 umarcor

Yes, the point of no2muacm is that you don't worry about what's inside, just use the pre-built code (single packed verilog file with everything ready in it). See the no2muacm-bin repo or the release tar gz. It has an axi stream interface and the demo also shows how use it across clock domain using a simple lightweight cross-clock module provided with it.

no2usb is a USB device core where you can implement ... whatever you want (I did keyboards, sound cards, midi devices, even an old school E1 interface with it). Interface to the SoC is not stream but wishbone, just like an peripheral.

Beware that the tinyfpga code has several USB protocol non-compliance especially regarding error handling / repeats / ... That was one of the motivation for a clean compliant re-implementation.

smunaut avatar Jul 07 '21 18:07 smunaut

@umarcor

Anyway, let's see if the timing results with usbserial are good enough! As you said, Juanma had it almost done!

👍 Fingers crossed!! :wink:

The physical one 😄

Oh, haha :smile: No, I haven't. I am not a home at the moment. But I will check when I'm back 🤩


Thx for the feedback, @smunaut!

Do you have any resource utilization results? I came across the reports in https://github.com/no2fpga/no2muacm/runs/2921175808?check_suite_focus=true but I am having troubles to indentify the core-only results 😅

stnolting avatar Jul 08 '21 14:07 stnolting

  • ~ 1000 LCs and 7 EBRs for no2muacm (full CDC ACM to AXI stream)
  • ~ 650 LCs and 10 EBRs for no2usb (USB device core configured with 2k TX and 2k RX buffers)

smunaut avatar Jul 08 '21 14:07 smunaut

I made some more tests with https://github.com/davidthings/tinyfpga_bx_usbserial and finally I came up with a setup on my UPduino board that is "working" 🎉 😄

The Setup

I am using the NEORV32 stream link interface (1 RX link and 1 TX link) to connect to the usb_uart module. Each links has a 1-entry deep FIFO in the processors SLINK module. I use two clock domain-crossing FIFOs (Radiant IPs - I was too lazy to write my own 😅) for each link to go from the 16MHz processor domain to the 48MHz USB domain. The processor runs a simple echo program - so it polls the RX link and echos everything received to the TX link.

The "Issues"

The setup works, but it does not work in a reliable way. Sometimes the USB enumeration seems to fail (the serial port does not show up in the terminal program or in the hardware manger at all). Sending/receiving single characters work, but a stress test sending large files crashes the USB module and sometimes even Windows. However, this could also be a problem with my setup. I know there are some problems in it:

  • everything is build on a bread board - including the chip-external electrical part of the USB interface
  • I am using Lattice Radiant (with Synplify Pro) and I still have timing problem:s 37 failing endpoints, all in the USB module. The worst has -2.6ns slack.
  • I am using the HF oscillator and the PLL to generate the 48MHz USB clock. I know that the oscialltor is not really accurate so that might be also a problem.

The "Plan"

I am not sure how to improve on that (I am not really familiar with the USB protocol). So I think I will shelve that for now. But it might be a good idea to add the stream interface infrastructure to the FOMU example projects in this repo.

I will try to use the rainy weekend to experiment with https://github.com/no2fpga/no2muacm because this looks really (really!!) promising! :+1:

edit If anyone is curious, here are the main source files:

  • Top entity, modified version of @juanmard's test setup (sry for my unpretty verilog): https://gist.github.com/stnolting/5d78267cc997cd7757e78dae83ac647c
  • Processor wrapper: https://gist.github.com/stnolting/5fb9244a7226b8ee0176836fa0be7d7a

stnolting avatar Jul 09 '21 11:07 stnolting

Now that I think about it, Radiant might be an issue :/

They changed the names (and sometime there is no equivalent at all) of all the primitives, the the no2muacm pre-built netlist is full of SB_LUT4 / SB_CARRY / ... but Radiant doesn't support those. They are only valid in either icecube2 or in the open toolchain.

smunaut avatar Jul 09 '21 11:07 smunaut

They changed the names (and sometime there is no equivalent at all) of all the primitives, the the no2muacm pre-built netlist is full of SB_LUT4 / SB_CARRY / ... but Radiant doesn't support those. They are only valid in either icecube2 or in the open toolchain.

I just saw that. Maybe find-and-replace might be an option but I have not further looked at all the instantiated primitives.

The provided pre-built setup is identical to the result of running make in no2muacm/gateware, right?

stnolting avatar Jul 09 '21 14:07 stnolting

I use two clock domain-crossing FIFOs (Radiant IPs - I was too lazy to write my own 😅)

🤣 Feel free to pick the sources in https://github.com/VUnit/vunit/tree/master/examples/vhdl/array_axis_vcs/src and adapt them to your own needs. You can safely ignore the copyright header and apply any coypleft/permissive license you want. I am the author of the majority of it, except for the PSL block (contributed by @tmeissner). I believe you can take this message as explicit permission for you to relicense and modify it to make it fit this project (NEORV32).

/cc @tmeissner for confirmation.

umarcor avatar Jul 09 '21 15:07 umarcor

Feel free to pick the sources in https://github.com/VUnit/vunit/tree/master/examples/vhdl/array_axis_vcs/src and adapt them to your own needs.

Thanks for the hint! Is this intended for synthesis and for arbitrarily-related clocks? Seems like there is no real synchronization between the two clock domains... 🤔

Anyway, I think there is no need for real CDC FIFOs. The FIFO-part is already in the processor's SLINK module and I think I will use https://github.com/no2fpga/no2muacm/blob/master/example/rtl/muacm_xclk.v for the CDC part.

stnolting avatar Jul 09 '21 16:07 stnolting

Just for the records: -> https://github.com/stnolting/neorv32/discussions/113 :rocket: :wink:

stnolting avatar Jul 09 '21 19:07 stnolting

Is this intended for synthesis and for arbitrarily-related clocks? Seems like there is no real synchronization between the two clock domains... 🤔

A FIFO is an adaptor for clock domain crossing per se. If one clock is used for writing and a different clock is used for reading, the FIFO is taking care of synchronisation. Moreover, from a behavioural point of view FIFOs allow to adapt components/cores manipulating data at different speeds, regardless of the components being in the same or different domains. Therefore, that FIFO should work without additional custom handling of the clocks. However, I did not try synthesis targeting Lattice devices. That's something we need to test.

Anyway, I think there is no need for real CDC FIFOs.

I think the FIFOs make sense for not implementing the domain crossing logic otherwise. The FIFO which is part of the processor's SLINK is mostly meant for dealing with the bottlenecks of having multiple channels in a single port. It needs to deal with x8 more traffic than each of the individual channels. It is sensible to have another FIFO between that one and the peripherals which require domain crossing.

The following diagram shows four possible setups: cores connected with a FIFO or "directly", either in the same domain or in a different domain. Note that each FIFO block is, in fact, two (in case bidirectional communication is required).

neorv32_cdc

I believe we might add that to the repo as follows:

neorv32_cdc_egs

But it might be a good idea to add the stream interface infrastructure to the FOMU example projects in this repo.

Agree. What do you think about the following?

neorv32_Fomu_Stream

So, use the output of the Stream for driving a register and use the three lower bits as enables for each of the three RGB signals. That can be built based on the current MinimalBoot or Minimal examples, and would allow testing the Stream feature without the complexity of the USB-UART or having an interconnect.

The application is useless indeed, because the NEORV32 can control the LED through the UART. However, I believe it is a simple and didactic example for users to understand how does data flow through both ways (memory mapped PWM and Stream).

umarcor avatar Jul 10 '21 13:07 umarcor

A FIFO is an adaptor for clock domain crossing per se. If one clock is used for writing and a different clock is used for reading, the FIFO is taking care of synchronisation. Moreover, from a behavioural point of view FIFOs allow to adapt components/cores manipulating data at different speeds, regardless of the components being in the same or different domains. Therefore, that FIFO should work without additional custom handling of the clocks.

Right, if the FIFO is seen as black box. Some FPGA block RAMs support an "intrinsic" FIFO mode (Xilinx Virtex-6 I think). So all the FIFO-related logic is done inside the memory block and there is no additional (LUT) logic around it.

But if you want to implement a FIFO in a behavioral way, you need to take care of the clock-domain crossing by yourself. The actual FIFO memory is no big deal since most block RAMs support a dual-port mode with individual read and write clocks.

The tricky thing is the handling of the read and write pointers. These need to be synchronized for the according clock domains to allow a correct computations of the FIFO's "empty" (for the read side) and "full" (for the write side) flags.

Here is an example of a FIFO's inner workings: https://www.researchgate.net/figure/FIFO-Block-Diagram-partitioned-on-clock-boundaries_fig7_247693488

I think the FIFOs make sense for not implementing the domain crossing logic otherwise. The FIFO which is part of the processor's SLINK is mostly meant for dealing with the bottlenecks of having multiple channels in a single port. It needs to deal with x8 more traffic than each of the individual channels. It is sensible to have another FIFO between that one and the peripherals which require domain crossing.

Correct, but there is one FIFO for each LINK in the processor' SLINK module. So if you configure 8 RX and 8 TX links there will be 16 FIFOs in total.

I agree that all kind of clock-domain crossing / width conversion / ... should be done processor-external.

The following diagram shows four possible setups: cores connected with a FIFO or "directly", either in the same domain or in a different domain. Note that each FIFO block is, in fact, two (in case bidirectional communication is required).

Great figures! :+1: I like the concept of the ...SystemTop_Streams.vhd wrapper and I agree that we should add this. The questions is how do we handle the actual stream interface? Single ports (std_logic & std_logic_vector only) for all 8 links plus a generic so only the actually implemented ones get internally connected or some kind of more sophisticated interface type (record)?

Agree. What do you think about the following? So, use the output of the Stream for driving a register and use the three lower bits as enables for each of the three RGB signals. That can be built based on the current MinimalBoot or Minimal examples, and would allow testing the Stream feature without the complexity of the USB-UART or having an interconnect.

I like that!

stnolting avatar Jul 11 '21 13:07 stnolting

But if you want to implement a FIFO in a behavioral way, you need to take care of the clock-domain crossing by yourself. The actual FIFO memory is no big deal since most block RAMs support a dual-port mode with individual read and write clocks.

The tricky thing is the handling of the read and write pointers. These need to be synchronized for the according clock domains to allow a correct computations of the FIFO's "empty" (for the read side) and "full" (for the write side) flags.

Here is an example of a FIFO's inner workings: https://www.researchgate.net/figure/FIFO-Block-Diagram-partitioned-on-clock-boundaries_fig7_247693488

I stand corrected 👍🏼. Thanks so much for the reference!

Correct, but there is one FIFO for each LINK in the processor' SLINK module. So if you configure 8 RX and 8 TX links there will be 16 FIFOs in total.

Interesting!

The questions is how do we handle the actual stream interface? Single ports (std_logic & std_logic_vector only) for all 8 links plus a generic so only the actually implemented ones get internally connected or some kind of more sophisticated interface type (record)?

This is the discussion we had in https://github.com/stnolting/neorv32/discussions/9, which does not have a best answer. Well, the best solution would be VHDL 2019 interfaces (VHDL/Interfaces#14), but those are not supported in GHDL yet.

Hence, I would suggest keeping it simple for now. Let's have a single visible AXI Stream (actually, two, one in each direction). We can use multiple channels through the DEST field. This will have slightly worse performance than your current proposal. With multiple FIFOs, the software can read/write data from/to each channel explicitly. When using a single FIFO and DEST, the software needs to handle that field for knowing which channel a data value corresponds to. Nevertheless, that is perfectly acceptable from a hardware-software partitioning perspective. That'd be the most area efficient variant.

It'd allow us to test the support for the Stream protocol, the interaction with USB-UART and allow users to actually attach their Stream peripherals to NEORV32, without having to deal with the challenges of generic multi-signal interfaces. Meanwhile, we can implement those as records (one for each direction) based on unconstrained arrays. However, that will imply learning on your side and polishing of my knowledge.

I agree that all kind of clock-domain crossing / width conversion / ... should be done processor-external.

Moreover, there should be ready-to-use libraries of AXI components for instantiating interconnects, crossbar switches, merging/spliting, serialisation/paralelisation (as in width conversion), etc. We should under no circumstance implement those components in NEORV32, because none of those needs any specific feature/customisation for NEORV32 and none of those is trivial to implement and maintain. PoC, surf or other projects mentioned in https://github.com/stnolting/neorv32/discussions/9 should provide those components/cores already. If they don't, or they are not in good shape, we should push in that direction as a community. That is, consume and enhance so that they are actually usable.

use the output of the Stream for driving a register and use the three lower bits as enables for each of the three RGB signals.

I like that!

Nice! Let me know what you think about how to expose one or multiple streams through the top-level ports. Then, we can focus on this first minimal example (with the osflow).

umarcor avatar Jul 11 '21 15:07 umarcor

Hence, I would suggest keeping it simple for now. Let's have a single visible AXI Stream (actually, two, one in each direction). We can use multiple channels through the DEST field.

I agree. But maybe we should implement that as rtl/templates/system/neorv32_SystemTop_Streams_simple.vhd. A new file rtl/templates/system/neorv32_SystemTop_axi4stream.vhd should provide all links using a naming that can also be identified by platform designers like Vivado.

Btw, the SLINK does not provide any advanced tagging signals beyond the base AXI-Stream protocol. If a DEST tag is required, one have to use some of the GPIOs and set them via software. If DEST proves to be valuable for a lot of applications then we could add that to the SLINK module directly.

It'd allow us to test the support for the Stream protocol, the interaction with USB-UART and allow users to actually attach their Stream peripherals to NEORV32, without having to deal with the challenges of generic multi-signal interfaces. Meanwhile, we can implement those as records (one for each direction) based on unconstrained arrays. However, that will imply learning on your side and polishing of my knowledge.

:+1: However, a setup with an entity that supports a various number of links should be rtl/templates/system/neorv32_SystemTop_Streams.vhd or something. But let's begin with the simple setup :wink:

Moreover, there should be ready-to-use libraries of AXI components for instantiating interconnects, crossbar switches, merging/spliting, serialisation/paralelisation (as in width conversion), etc. We should under no circumstance implement those components in NEORV32, because none of those needs any specific feature/customisation for NEORV32 and none of those is trivial to implement and maintain.

Good point! However, I was thinking about some "module" folder somewhere in setups. I have written a button controller for the FOMU and I am working on a Wishbone-QSPI that support "execute in place" for external flash memories. It would be nice to put that somewhere so we can use that for the FOMU and other setups. But on the other side, I could put that in new repos as well and include it as submodules. :thinking:

stnolting avatar Jul 12 '21 11:07 stnolting

I agree. But maybe we should implement that as rtl/templates/system/neorv32_SystemTop_Streams_simple.vhd. A new file rtl/templates/system/neorv32_SystemTop_axi4stream.vhd should provide all links using a naming that can also be identified by platform designers like Vivado.

Let's give it another iteration:

  1. rtl/templates/system/neorv32_SystemTop_SingleStream.vhd: a single Stream port pair, a single software channel.
  2. rtl/templates/system/neorv32_SystemTop_Streams.vhd: a single Stream port pair, multiple software channels.
  3. rtl/templates/system/neorv32_SystemTop_MultipleStreams.vhd: multiple Stream port pairs, multiple software channels.

Each of them would have a matching *_AXI4 variant for providing naming usable in vendor tools.

Btw, the SLINK does not provide any advanced tagging signals beyond the base AXI-Stream protocol. If a DEST tag is required, one have to use some of the GPIOs and set them via software. If DEST proves to be valuable for a lot of applications then we could add that to the SLINK module directly.

You implemented 3 already. There, DEST is not necessary because it is implicit. The software writes to different registers for each channel and, in hardware each one is physically a different port. By the same token, in 1 there is no need for DEST, because there is a single channel. Therefore, DEST is only required in 2, where a single port pair is used for multiple channels. As a result, data elements can be interleaved for multiple destinations/origins. There, both the software and the interconnect need DEST for telling them apart.

For now, I think we should avoid 2, precisely because it requires DEST and the interconnect. I suggest implementing 1 first, and discussing the port naming/types for 3.

Good point! However, I was thinking about some "module" folder somewhere in setups. I have written a button controller for the FOMU and I am working on a Wishbone-QSPI that support "execute in place" for external flash memories. It would be nice to put that somewhere so we can use that for the FOMU and other setups.

As commented in https://github.com/stnolting/neorv32/discussions/9#discussioncomment-816589:

creating a subdir of optional external components might open the door to hell 😆.

So, yeah, maybe it's time to create stnolting/neorv32-examples or neorv32/neorv32 and neorv32/examples.

Modules such as the button controller for Fomu, the QSPI driver, or the USB-UART are very interesting features, and it would be nice to have them in this same repo. However, it is difficult to draw a line once we start doing that. Therefore, it might be sensible to do something such as: if a design needs any source other than a single BoardTop and a components package, then it should be located in the examples repo. This means we would move the MixedLanguage example there, along with adding USB-UART submodules and the Fomu components you did.

Overall, I must say that working with submodules might be uncomfortable at first. So, I would recommend not to abuse (do not create multiple repos for different examples). However, creating a single examples or designs repo is sensible.

But on the other side, I could put that in new repos as well and include it as submodules. 🤔

Note that NEORV32 should be a submodule of the examples repo, not the other way! That is because users might want to use/consume NEORV32 without the examples, but no one will use the examples without the core.

umarcor avatar Jul 12 '21 12:07 umarcor

For now, I think we should avoid 2, precisely because it requires DEST and the interconnect. I suggest implementing 1 first, and discussing the port naming/types for 3.

I agree. One step at a time. I will take care of that file. The question is (yes, we had that already, too :wink:) what kind of peripheral IOs and config options do we export?

As commented in #9 (reply in thread):

creating a subdir of optional external components might open the door to hell 😆.

Seems like I'm knocking on this door again and again without really noticing it :smile:

I agree that more sophisticated and platform-independent components like the QSPI module should be in a totally different repo. In terms of FOMU, I think it would be ok to include the button controller right into some top module for the FOMU. :thinking:

stnolting avatar Jul 12 '21 13:07 stnolting

I agree. One step at a time. I will take care of that file. The question is (yes, we had that already, too 😉) what kind of peripheral IOs and config options do we export?

Just add the external ports corresponding to a single Stream output and a single Stream input. As done with other peripherals, users will drive them constant or leave them open if they don't want to use the Stream feature.

In terms of FOMU, I think it would be ok to include the button controller right into some top module for the FOMU. 🤔

That is the point. If whatever you want to add does fit in a single file without making it obviously difficult to read/maintain, then it's absolutely ok to have it here. However, if any example needs at least one additional file for clarity, then, it fits in the examples repo.

umarcor avatar Jul 12 '21 13:07 umarcor

Just add the external ports corresponding to a single Stream output and a single Stream input. As done with other peripherals, users will drive them constant or leave them open if they don't want to use the Stream feature.

I mean the other peripherals like PWM, UART, TWI, etc. Expose all of them (with defaults) and let the user decide which to actually use?

However, if any example needs at least one additional file for clarity, then, it fits in the examples repo.

:+1:

stnolting avatar Jul 12 '21 13:07 stnolting

I mean the other peripherals like PWM, UART, TWI, etc. Expose all of them (with defaults) and let the user decide which to actually use?

Oh, that is absolutely up to you! You can copy the Minimal, MinimalBoot or Test templates and just add the Streams. No need to add all of them, no need to have any other peripheral. The purpose of this template is for user to try the Stream feature, not for them to use it as-is in their final design. By the time a user is implementing a final design with an specific set of ports, they will be instantiating the NEORV32 top directly. Templates are useful before that; since a newcomer arrives until they gain knowledge for using the top directly. Very particularly, for user who want to try NEORV32 on new boards, as templates provide easy to import black boxes.

umarcor avatar Jul 12 '21 13:07 umarcor

Ok, let's see what I will come up with 😅

By the time a user is implementing a final design with an specific set of ports, they will be instantiating the NEORV32 top directly. Templates are useful before that; since a newcomer arrives until they gain knowledge for using the top directly. Very particularly, for user who want to try NEORV32 on new boards, as templates provide easy to import black boxes.

This paragraph should be in the documentation!! :+1:

stnolting avatar Jul 12 '21 13:07 stnolting

The provided pre-built setup is identical to the result of running make in no2muacm/gateware, right?

Yeah but with several complications : (1) Even the source still uses some SB_xxx IPs. There are some optimized constructs in there that just can't be inferred (2) In the SB_RAM40_4K , I use an INIT_FILE parameter to init content from file which is only supported in the open toolchain. (3) In the process of making it work, I actually found a bug in Radiant LSE where there FSM optimization pass creates bad logic (post synth simulation shows bug ...) so you need to make sure and use Synplify Pro.

smunaut avatar Jul 16 '21 12:07 smunaut

@smunaut thanks for clearing! :+1:

stnolting avatar Jul 19 '21 15:07 stnolting

Maybe you can try my implementation that has a FIFO embedded. https://github.com/ulixxe/usb_cdc

ulixxe avatar Nov 02 '21 18:11 ulixxe