Silice icon indicating copy to clipboard operation
Silice copied to clipboard

General code issues thread

Open TaraHoleInIt opened this issue 5 years ago • 57 comments

I'll start the first one:

In my attempt to debug WaitForRisingEdge by passing in a debug LED I now receive the following error message:

^^^^^^^^^^^^^^^^^^^ incorrect input parameters in call to algorithm 'WaitForRisingEdge', last correct match was parameter 'Signal'

I've checked the documentation and examples a few times, and its not immediately obvious why this is coming up.

$$Period = 1 / Frequency;
$$NSPerClock = 1000000000 * Period;
$$ClocksPerMS = math.floor( 1000000 / NSPerClock );

algorithm Debounce( input uint1 Source, output uint1 Dest ) <autorun> {
	uint24 Clocks = 0;
	uint8 Bitmap = 0;

	while ( 1 ) {
		Clocks = Clocks + 1;

		if ( Clocks == $ClocksPerMS$ ) {
			Bitmap = ( Bitmap << 1 ) | Source;
			Clocks = 0;
		}

		// If all the bits are set then the button should have finished bouncing
		// and the output should be reliable.
		Dest = ( Bitmap == 8hff ) ? 1 : 0;
	}
}

subroutine WaitForRisingEdge( input uint1 Signal, output uint1 Debug ) {
	uint1 Last = 0;

	Debug = 1;

	while ( 1 ) {
		if ( Last == 0 && Signal == 1 ) {
			break;
		}

		Last = Signal;
	}

	Debug = 0;
}

algorithm main( output uint$NUM_LEDS$ leds, output uint$NUM_GPIO_OUT$ gp, input uint$NUM_GPIO_IN$ gn ) {
	uint1 FilteredButton = 0;
	uint1 Button = 0;
	uint1 State = 0;

	Debounce Btn(
		Source <: Button,
		Dest :> FilteredButton
	);

	Button := gn[ 10,1 ];
	leds[ 0,1 ] := State;

	while ( 1 ) {
		// Error
		// ^^^^^^^^^^^^^^^^^^^ incorrect input parameters in call to algorithm 'WaitForRisingEdge', last correct match was parameter 'Signal'
		( ) <- WaitForRisingEdge <- ( FilteredButton, LED );
		State = ~State;
	}
}
```

TaraHoleInIt avatar Jan 11 '21 20:01 TaraHoleInIt

The syntax for the subroutine call is: (output0,...,outputN) <- sub <- (input0,....,inputK) So the line has to be changed for (LED) <- WaitForRisingEdge <- ( FilteredButton)

Something else, more related to timing -- in this part:

   Clocks = Clocks + 1;
   if ( Clocks == $ClocksPerMS$ ) {
	// ...
   }

It is best to increment Clock after the if (adjusting the logic accordingly). This will help timing as the test and addition will no longer depend on each other within the same cycle. (I started writing guidelines for this, hoping to finish soon). In this case this will have little impact, but it is a good habit to take.

(Edit:) the same is true for Dest = ( Bitmap == 8hff ) ? 1 : 0; which is used in the condition after being updated in the if. Btw, this could also be written with an always assign, writing Dest := ( Bitmap == 8hff ) ? 1 : 0; above the while(1) (this will effectively assign Dest first thing every clock cycle). In this case it does not really impact, but that can be very convenient if Dest has to be constantly updated regardless of the state of the algorithm.

sylefeb avatar Jan 11 '21 23:01 sylefeb

Thank you for the feedback! I will keep this in mind for the future.

That being said, I'm still having trouble getting WaitForRisingEdge to work:

subroutine WaitForRisingEdge( input uint1 Signal ) {
	uint1 Exit = 0;
	uint1 Last = 0;

	Last = Signal;

	while ( Exit == 0 ) {
		Exit = ( Last == 0 && Signal == 1 ) ? 1 : 0;
		Last = Signal;
	}
}

algorithm main( output uint$NUM_LEDS$ leds, output uint$NUM_GPIO_OUT$ gp, input uint$NUM_GPIO_IN$ gn ) {
	uint1 FilteredButton = 0;
	uint1 Button = 0;
	uint1 State = 0;
	uint1 ADebug = 1;

	Debounce Btn(
		Source <: Button,
		Dest :> FilteredButton
	);

	Button := gn[ 10,1 ];
	leds[ 0,1 ] := ADebug;

	gp[ 0,1 ] := State;

	while ( 1 ) {
		ADebug = 1;
			( ) <- WaitForRisingEdge <- ( FilteredButton );
		ADebug = 0;

		State = ~State;
	}
}

I'm not sure if its a code problem or FPGAs being a bit unfamiliar. As far as I can tell it never actually returns, but FilteredButton is behaving just as I would expect.

TaraHoleInIt avatar Jan 13 '21 03:01 TaraHoleInIt

I think I see the problem. When you call a subroutine it copies its arguments - precisely to avoid them changing while it processes. This is why WaitForRisingEdge does not see FilteredButton change.

Two possible approaches. 1) Incorporate the subroutine in the algorithm (which they are meant to be), so you can actually access the algorithm internals. This becomes:

algorithm main( output uint$NUM_LEDS$ leds, output uint$NUM_GPIO_OUT$ gp, input uint$NUM_GPIO_IN$ gn ) {

	uint1 FilteredButton = 0;
	uint1 Button = 0;
	uint1 State = 0;
	uint1 ADebug = 1;

        subroutine WaitForRisingEdge( reads FilteredButton ) {
	        uint1 Exit = 0;
	        uint1 Last = 0;
        
	        Last = FilteredButton ;
        
	        while ( Exit == 0 ) {
		        Exit = ( Last == 0 && FilteredButton == 1 ) ? 1 : 0;
		        Last = FilteredButton;
	        }
        }


	Debounce Btn(
		Source <: Button,
		Dest :> FilteredButton
	);

	Button := gn[ 10,1 ];
	leds[ 0,1 ] := ADebug;

	gp[ 0,1 ] := State;

	while ( 1 ) {
		ADebug = 1;
			( ) <- WaitForRisingEdge <- ( );
		ADebug = 0;

		State = ~State;
	}
}

Note the explicit declaration reads FilteredButton ; this is to make it clear what a subroutine is supposed to manipulate.

The other option is to turn WaitForRisingEdge into another algorithm and use parameter bindings:

algorithm WaitForRisingEdge( input uint1 FilteredButton ) {
  uint1 Exit = 0;
  uint1 Last = 0;

  Last = FilteredButton ;

  while ( Exit == 0 ) {
    Exit = ( Last == 0 && FilteredButton == 1 ) ? 1 : 0;
    Last = FilteredButton;
  }
}

algorithm main( output uint$NUM_LEDS$ leds, output uint$NUM_GPIO_OUT$ gp, input uint$NUM_GPIO_IN$ gn ) {

	uint1 FilteredButton = 0;
	uint1 Button = 0;
	uint1 State = 0;
	uint1 ADebug = 1;

	Debounce Btn(
		Source <: Button,
		Dest :> FilteredButton
	);

        WaitForRisingEdge waitedge(
            FilteredButton <: FilteredButton, // parameter binding
        );

	Button := gn[ 10,1 ];
	leds[ 0,1 ] := ADebug;

	gp[ 0,1 ] := State;

	while ( 1 ) {
		ADebug = 1;
		() <- waitedge <- (); // no input parameter as it is bound
		ADebug = 0;

		State = ~State;
	}
}

When a parameter is bound it tracks the value of the source (connected through a wire in terms of hardware). Note that there are two bindings: <: tracks the value as it is updated during the cycle, <:: tracks the value at it was at the cycle starts (this is a topic I will document better, it is a very interesting degree of freedom to play with in designs). In short, <: is typical and allows the algorithm to immediately start refreshing the outputs when a value is changed -- but this can also reduce the max frequency of the design as the signal propagation deepens. <:: will trigger a refresh only at the next cycle: pay one in latency but increase max freq.

sylefeb avatar Jan 13 '21 20:01 sylefeb

I have done some testing with generating delays, it seems like ones done inline are accurate while wrapping them in a subroutine causes problems.

For example, if I have a counter in my main loop and toggle a pin whenever the counter reaches 4 then I get a 250ns signal which is what I would expect from a 16MHz clock. Trying to make this neater by wrapping it up in a subroutine is nowhere near as accurate.

Maybe I'm just approaching this wrong and there is a more correct way to accurately handle timing?

TaraHoleInIt avatar Jan 18 '21 21:01 TaraHoleInIt

Hi - the timing would have to account for the subroutine entry/exit cycles. This is partially discussed in Section 5 but I really have to revise this with more examples and diagrams, that is an important topic.

So for instance, let's consider this wait subroutine in a small test case:

subroutine wait(input uint16 incount)
{
  uint16 count = uninitialized;
  count = incount;
  while (count != 0) { count = count - 1; }
}
	
algorithm main( output uint$NUM_LEDS$ leds,input uint1 btns ) 
{
  uint16 cycle = 0;

  always { cycle = cycle + 1; }

  __display("[before] cycle %d",cycle);
  () <- wait <- (0);
  __display("[next  ] cycle %d",cycle);
  () <- wait <- (4);
  __display("[done  ] cycle %d",cycle);
}

Now simulating with icarus make icarus (Makefile at the end) we get:

[before] cycle     0
[next  ] cycle     4
[done  ] cycle    12

So we can see the subroutine call, even with a 0 parameter takes 4 cycles. With an input of 4 it then takes 8 cycles. So there is an overhead of four: 2 for entry/exit of subroutine and 2 for the entry/exit of the while. Note that there are chain rules, so adding a while right after the other will add a single cycle instead of 2 (clarifying the documentation right now!).

What if we want a less than 4 cycles wait? We can use multiple ++: operators, or use a circuitry to inline a while loop with a parameter for its duration.

Taking these delays into account should give you the exact timing.

.DEFAULT: test.ice
		silice-make.py -s test.ice -b $@ -p basic -o BUILD_$(subst :,_,$@)

clean:
	rm -rf BUILD_*

sylefeb avatar Jan 19 '21 07:01 sylefeb

Ohhhh okay! I think I see what was happening then.

I don't think I was taking into account not only that entering and exiting the subroutine were taking cycles, but that the main loop was adding some delay as well. That being said, nothing I'm doing now requires timing that precise, I just wanted to understand the behaviour I was seeing.

Thank you :)

TaraHoleInIt avatar Jan 20 '21 20:01 TaraHoleInIt

Hello,

I am a Silice and FPGA newbie and have a question regarding readwrite permissions on subroutines.

In the following example the subroutine out_led is called by the subroutine out_led_inverted. For this, the subroutine out_led_inverted needs to get permission to access the parameter pattern from out_led.

algorithm main(output uint5 leds) {

uint26 cnt = 0;

   subroutine out_led(input uint5 pattern, readwrites leds) {
      leds = pattern;
   }

   subroutine out_led_inverted(input uint5 pattern_inv, readwrites i_out_led_pattern) {
   //                                                              ^^^^^^^^^^^^^^^^^
      () <- out_led <- (~pattern_inv);
   }

   while(1) {
      cnt = cnt + 1;
      () <- out_led_inverted <- (cnt[21,5]);
   }
} 

But the syntax for this is not clear to me. Only from the compiler error message "variable 'i_out_led_pattern' is written by subroutine 'out_led_inverted' without explicit permission" I could recognize the name for this parameter. Is this intentional or did I miss something?

Regards, Bernd.

at91rm9200 avatar Dec 25 '21 09:12 at91rm9200

Hello Bernd,

The error message is indeed not useful, I'll improve that. The subroutine out_led_inverted has to explicitly declare it intends to call out_led. Here is how to do that: https://github.com/sylefeb/Silice/blob/5fa9ed995c20f7bc9654e7c63a858844efb26fbc/tests/issues/108_rw_perm.ice#L9

Welcome to Silice! Best regards, Sylvain

sylefeb avatar Dec 25 '21 14:12 sylefeb

Improved error message (in wip branch):

() <- out_led <- (~pattern)
^^^^^^^^^^^^^^^^^^^^^^^^^^^ subroutine 'out_led_inverted' calls other subroutine 'out_led' without permssion
                            add 'calls out_led' to declaration if that was intended.

error: Silice compiler stopped

Thanks for pointing this out!!

sylefeb avatar Dec 25 '21 14:12 sylefeb

Hello Sylvain,

thank you for the explanation. It works perfectly.

Regards, Bernd.

at91rm9200 avatar Dec 25 '21 15:12 at91rm9200

Hello,

I am trying to bind two pmod bits, one as input and the other one as output, to an algorithm. The input binding "bit_in <: pmod.i[0,1]" seems to work, but the output binding "bit_out :> pmod.o[1,1]" leads to an error: "cannot find accessed base.member 'pmod.o'". For sure, this is a beginner question. But I cannot see, how to solve this.

algorithm alg_test(input uint1 bit_in, output uint1 bit_out)
{
   bit_out = ~bit_in;
}

algorithm main(output uint5 leds, inout uint8 pmod) {

   alg_test a(bit_in <: pmod.i[0,1], bit_out :> pmod.o[1,1]);  // <-- cannot find accessed....

   while (1) {
      pmod.oenable = 8b00000010;
      () <- a <- ();
   }
}  

Regards, Bernd.

at91rm9200 avatar Jan 31 '22 22:01 at91rm9200

Hi Bernd,

Your syntax makes perfect sense, unfortunately you've hit a limitation on output bindings (see #208). The error message is not helpful and I'll improve that shortly (ultimately the syntax your are using will be supported).

The good news is, there are simple workarounds, see https://github.com/sylefeb/Silice/blob/draft/tests/issues/108_pmod.ice

Thanks for the report! Sylvain

sylefeb avatar Feb 01 '22 06:02 sylefeb

Partially fixed the issue:

  • The error message is now more informative:
pmod.o[1,1]
^^^^^^^^^^^ binding an output to a partial variable (bit select, bitfield, table entry) is currently unsupported
  • Binding the output to pmod.o now would work (without the bit select)
  • Limitation on binding to bit select remains, will work on it (see #208)

Changes in draft branch.

sylefeb avatar Feb 01 '22 08:02 sylefeb

Hello Sylvain,

thank you for the answer, for improving the Silice compiler and providing the workaround. I think, I am going to stick with the master branch for now.

Regards, Bernd.

at91rm9200 avatar Feb 01 '22 17:02 at91rm9200

Hello Sylvain,

can an inout pmod simply be used as a gpio? The following program should output four pulses on the icestick pmod (bit 0) and then read bit 1 (pulled up via a resistor) and output it on the LED. The program only works if the marked step is commented out. If the step is not commented out, then bit 1 is apparently not read in correctly. I suspect I am missing something significant, or may be the inout pmods can not be used this way?

algorithm main(output uint5 leds, inout uint8 pmod) {
   uint3 ctr = 4;
   uint1 pmod_bit1 = uninitialized;

   pmod.oenable:= 8b00000001;
   pmod.o = 8b00000001;

   while (ctr > 0) {
      ctr = ctr - 1;
      ++:                       // <--- works, if this step is commented out.
      pmod.oenable = 8b00000001;
      pmod.o = 8b00000001;
      ++:
      pmod.oenable = 8b00000001;
      pmod.o = 8b00000000;
   }
   ++:
   pmod.oenable = 8b00000001;
   pmod_bit1 = pmod.i[1,1];     // pmod bit 1 is pulled high via resistor.

   leds[0,1] = pmod_bit1;

   while(1) {
   }
}  

Regards, Bernd.

at91rm9200 avatar May 29 '22 22:05 at91rm9200

Hi Bernd - thanks for the report, I am looking into it! (I see the same behavior and it is indeed confusing).

sylefeb avatar May 30 '22 07:05 sylefeb

I believe I have identified the issue -- it's not obvious, and I am actually surprised by the behavior. Nevertheless I decided to modify the tristate so that the behavior is stable. This means introducing a one cycle latency between when .o / .oenable are set and the actual change on the pin, but that is anyway better in terms of managing IO skew/delays.

The change is applied in the wip branch ; I will soon merge with master but a few adaptations remain to be done in other projects. Please let me know if you have any trouble.

(Side note: instead of while (ctr > 0) use while (ctr != 0) to reduce LUT usage ;) An even better way is to make ctr signed and test whether the sign bit is set (meaning -1 is reached), adjusting the counter accordingly).

sylefeb avatar May 30 '22 15:05 sylefeb

Hello Sylvain,

thanks for tracking this down. I think I will wait for the master branch. But I have one more question about the above example: The I/O configuration of the pmod does not change in the algorithm. It is always 8b00000001, seven inputs and one output in the same place. Shouldn't a single always assignment (pmod.oenable:= 8b00000001) be enough? Or does pmod.oenable have to be explicitly set before each pmod.o and pmod.i statement, as I did above?

Regards, Bernd.

at91rm9200 avatar May 30 '22 17:05 at91rm9200

Hi Bernd, the single assignment should indeed be enough (and is enough after the fix), the best being a single pmod.oenable := 8b00000001; since the := will hold the value at all times (unless overridden by the algorithm), allowing for a simpler logic. But even a single pmod.oenable = 8b00000001; at the top (without the :=) would do now.

The fact you needed multiple assignments was a side effect of the problem itself.

sylefeb avatar May 30 '22 18:05 sylefeb

Hello Sylvain,

thank you for the explanation. I have come across the strange "pmod behavior" several times. I'm glad you were able to fix it.

Regards, Bernd.

at91rm9200 avatar May 30 '22 18:05 at91rm9200

Thanks a lot Bernd for the tests + reports (inouts are used in fewer designs, and tristates can be a bit tricky...). In general feel free to reach out if anything seems suspicious, I am happy to check.

sylefeb avatar May 31 '22 07:05 sylefeb

Hello Sylvain,

I am having difficulties to get the dual ported BRAM working. The following design for the Icestick should use ICESTORM_RAM, but if I look into the log-file (next.log) I can see a utilization of 0 % for the RAM. I have the "simple_dualport_bram" working in another design, but this time I want to read from both ports.

unit main(output uint5 leds) {

   dualport_bram uint8 mem[128] = {0,1,2,3,4,5,6,7,8,9,pad(uninitialized)};
   uint7 ctr = 0;

   algorithm {

      while(1) {
         mem.wenable0 = 0;       // read port 0
         mem.addr0 = ctr;
         ++:
         leds = mem.rdata0;

         mem.wenable1 = 0;       // read port 1
         mem.addr1 = ctr + 1;
         ++:
         leds = mem.rdata1;

         ctr = ctr + 1;
      }
   }
}
Info: Device utilisation:
Info: 	         ICESTORM_LC:   170/ 1280    13%
Info: 	        ICESTORM_RAM:     0/   16     0%
Info: 	               SB_IO:     6/  112     5%
Info: 	               SB_GB:     3/    8    37%
Info: 	        ICESTORM_PLL:     0/    1     0%
Info: 	         SB_WARMBOOT:     0/    1     0%

I am not using the very latest version of yosys (0.20 ), nextpnr (0.3) and Silice.

It would be nice if you could check if I did something wrong.

Regards, Bernd.

at91rm9200 avatar Dec 30 '22 22:12 at91rm9200

Hi Bernd,

Just had a quick look, this is indeed odd. As a workaround comment mem.wenable0 = 0; and it will synthesize using brams again (tested on Yosys 0.17+76). As long as you only read on port 0 this is fine because the default (on fpga) will be 0 anyway. That actually means using the bram as a simple_dualport_bram that allows read/writes on port 1 and only reads on port 0.

I'll investigate further. Thanks for the report!

sylefeb avatar Dec 31 '22 10:12 sylefeb

Hello Sylvain,

thank you. This works. I can read now over port 0 and read/write over port 1. As far as I can see, using dualport_bram (vs. simple_dualport_bram) seems to double the resources in the FPGA concerning ICESTORM_RAM. This would halve the amount of BRAM. However.

Bonne année.

Regards, Bernd.

at91rm9200 avatar Dec 31 '22 12:12 at91rm9200

Hi Bernd, interesting, I'll have a closer look at how Yosys synthesizes the BRAM. From the documentation it is unclear whether a 'true' dualport_bram is supported. I'll investigate and adjust the framework (and/or Verilog template) accordingly.

Best wishes for 2023! Sylvain

sylefeb avatar Jan 02 '23 09:01 sylefeb

Hello Sylvain,

on the Gisselquist (zipcpu) blog, I found an article, which describes that the ICE40 does not contain dual ported BRAM with two separate read ports. As a "workaround" yosys uses two BRAMs with connected write ports.

https://zipcpu.com/dsp/2020/02/07/bad-histogram.html

This would at least explain the increase of ICESTORM_RAM.

Regards, Bernd.

at91rm9200 avatar Jan 04 '23 23:01 at91rm9200

Hi Bernd, great finding. It makes sense as yosys chooses to write the same to both and then read separately (I use a similar trick for registers in my riscv CPUs, but doing it 'manually' with two BRAMs). This can be seen in the build.json file, searching for "SB_RAM40_4K" you'll find two instances "__main.__mem__mem.buffer.1.0" and "__main.__mem__mem.buffer.0.0". Both are indeed receiving the same write signals (eg. "WADDR": [ 94, 95, 96, 97, 98, "0", "0", "0", 99, 100, "0" ]). Always amazed by yosys inference capabilities! So I think I will disable dualport_bram for the ice40s: the compiler will stop on an unsupported error, suggesting to use a simple_dualport_bram or registers instead -- I find this less confusing and more true to the hardware capability.

sylefeb avatar Jan 05 '23 07:01 sylefeb

Hello Sylvain,

in my case, the current Silice/yosys behavior solves my problem. Even if the RAM utilization doubles. As a FPGA-novice (using Silice exclusively since about one year), I would have a lot of trouble, to handle the BRAM manually, as you did with your RISC-V cpu.

Maybe you could introduce something like a restricted_dualport_bram for the ICE40 and give a hint in the documentation that this variant provides one read/write and one read port and is built out of two BRAMS.

Regards, Bernd.

at91rm9200 avatar Jan 05 '23 09:01 at91rm9200

Hello Sylvain,

Can an algorithm that was called asynchronously be called a second time (asynchronously) without being finished the first time?

For example, I have this timer unit:

unit timer24(input uint24 value, output! uint1 timeout)
{

   uint24 local_value = uninitialized;

   algorithm {
      timeout = 0;
      local_value = value;

      while (local_value != 0) {
         local_value = local_value - 1;

      }
      timeout = 1;
   }
}

I want to monitor a timeout as follows:

unit main()
{
   uint1 timer1_timeout = uninitialized;
   timer24 timer1(timeout :> timer1_timeout);

   algorithm
   {
      while (1) {
         timer1 <- (100000);
         ++:
         while(check_for_event1) {
            if (timer1.timeout == 1) {
               goto exit_error;
            }
         }

         // the timer is not expired here. The timer algorithm is still running.
         timer1 <- (100000); // What happens here?
         ++
         while(check_for_event2) {
            if (timer1.timeout == 1) {
               goto exit_error;
            }
         }

         // ..... normal processing
   
      }
      exit_error:
   }
}  

Can it be done this way? I doubt it.

Regards, Bernd.

at91rm9200 avatar Mar 17 '23 10:03 at91rm9200

Hi Bernd,

If an algorithm is called while already running, it will be forced to restart:

algorithm test()
{
  uint8 i=0;
  while (i<12) {
    __display("i = %d",i);
    i = i + 1;
  }
}

algorithm main(output uint8 leds)
{
  test t1;
  t1 <- ();
++: ++: ++: ++:
  t1 <- ();	
  while (!isdone(t1)) { }
}

Output is:

i =   0
i =   1
i =   2
i =   0 // forced restart
i =   1
i =   2
i =   3
i =   4
i =   5
i =   6
i =   7
i =   8
i =   9
i =  10
i =  11

If I understand correctly that would fit your use case? Another situation this is useful is when the algorithm first state contains a pipeline to be fed (feature still being documented)

Otherwise the algorithm has to be instanced multiple times (as many as the number that has to run in parallel).

sylefeb avatar Mar 17 '23 15:03 sylefeb