ShiftPWM icon indicating copy to clipboard operation
ShiftPWM copied to clipboard

Teensy 3.0

Open apmorton opened this issue 11 years ago • 9 comments

Hello,

I have spent some time making this work with teensy 3.0, but my efforts were mostly specific to my application, so I pretty much implemented only what I needed.

There are however some pointers I can give you from what I found.

It looks to me like what you do with pins_arduino_compile_time.h is already accomplished with the teensy 3.0, but I am not sure, so in my implementation I simply used digitalWrite for the latch pin. The clock speed is so much higher on the teensy 3.0 I don't think it matters much.

The most important part is of course the interrupt handler, which is mostly identical other than the add_one_pin_to_byte macro, which is as follows.

#define add_one_pin_to_byte(sendInt, thres, ptr) { \
    unsigned char val = *ptr; \
    asm volatile("cmp %0, %1" :: "r" (thres), "r" (val) :); \
    asm volatile("rrx %0, %1" : "=r" (sendInt) : "r" (sendInt) :); \
}

the ARM RRX function works on a 32bit register, so you must pass it an unsigned int for sendInt, and then shift it right 24bits after you add all the bits to send.

for the interrupt timer you use IntervalTimer

m_timer = new IntervalTimer();
period = 1000000.0 / (ledFrequency * maxBrightness);
m_timer.begin(myHandlerFunction, period);

with 3 registers at 100hz pwm frequency with 255 maximum brightness, these are my results.

Load of interrupt: 0.3640691
Clock cycles per interrupt: 685.31
Interrupt frequency: 25500.00 Hz
PWM frequency: 99.61 Hz

This is without the chips actually connected (by boards are being fabricated now), so I suspect much of that is waiting for SPI to timeout in the read loop?

If I don't actually send the values over SPI it drops to Load of interrupt: 0.1692283

If you would like to see my code I will gladly post it somewhere, but it is teensy 3.0 specific.

apmorton avatar Jun 07 '13 04:06 apmorton

The most important bit of code is the 2 assembly lines, which are highly architecture specific. On AVR, the compare result is put into the carry. If you do a shift over carry next, you can shift that 1 bit into a byte. Do it 8 times and you have one full byte of compare results.

You would have to check the ARM architecture to see whether this trick works on the teensy as well.

Another optimization I do is that I overlap SPI output with calculating the values for the next byte. The interrupt load on AVR for your setup is 0.27 (according to the calculator on my website). You are pretty close, so maybe the SPI speed is the limiting factor.

I look forward to hearing whether your code efforts produce blinking lights! If it works, commit it on GitHub. If you fork my repo and work from there, preferably with compile time switches between the platforms, I can pull in your results and give you credit.

elcojacobs avatar Jun 07 '13 08:06 elcojacobs

According to what I put in the calculator, it should be 0.12. 3 registers, 255 max, 100hz, 48mhz

I should have mentioned that I tested my code, sort of.

Instead of having it write to SPI, i have it store in an extern unsigned char, defined in the main sketch, and then print that in binary over serial when it changes.

with all 8 channels set at increments of 30 it goes something like this.

ShiftPWM.SetOne(0, 30);
ShiftPWM.SetOne(1, 60);
ShiftPWM.SetOne(2, 90);
...
11111111
11111110
11111100
11111000
11110000
11100000
11000000
10000000
0

and then repeats.

So the two instructions in ASM seem to be working properly, which is good. CMP + RRX do the same thing on arm as CP + ROR do on avr, with the difference being RRX operates on a 32bit register where ROR operates on an 8bit register, so you just have to shift it down after you put all the bits in.

My code is using the SPI module, so that probably attributes some of the extra load, as SPI becomes completely serial, waiting until the send finishes before calculating the next byte, although I only see this being a real issue with very long chains of registers.

I will see about contributing my code back with compile time switches, but it could get pretty messy the way its structured now.

It might be best to move all the arch specific things into precompiler defines and then into arm.h and avr.h, and then conditionally include those in ShiftPWM.h or something.

I am not entirely sure where to get the information for pins_arduino_compile_time.h, as it doesn't seem to line up with pins_arduino.h, but maybe I haven't looked long enough. For me I was using SPI anyway, so using digitalWrite on only the latch pin didn't seem like a huge deal.

perhaps you could take a look at the teensy 3.0 hardware definitions from teensyduino and construct the entries for pins_arduino_compile_time.h, and I will test them?

apmorton avatar Jun 07 '13 14:06 apmorton

I made that file myself, because Arduino normally looks up the port and bit in a program memory struct. DigitalWrite takes 53 clock cycles, while a bitset on a known port takes 2.

That file was a way to make the pin numbers known at compile time instead of looking them up at runtime. Maybe the digitalWriteFast library can replace it.

I have not worked on ShiftPWM for quite a while, because BrewPi takes up all my time.

Only the latch pin as digitalWrite() is not that bad, but I also have a non SPI version, which has a lot more pin writes. With my hack, the non SPI version takes 13 clock cycles per pin (5 for SPI).

It could be that Teensy is smarter about this, I have not looked into it.

Elco

On 7 June 2013 16:05, Juvenal1228 [email protected] wrote:

According to what I put in the calculator, it should be 0.12. 3 registers, 255 max, 100hz, 48mhz

I should have mentioned that I tested my code, sort of.

Instead of having it write to SPI, i have it store in an extern unsigned char, defined in the main sketch, and then print that in binary over serial when it changes.

with all 8 channels set at increments of 30 it goes something like this.

ShiftPWM.SetOne(0, 30); ShiftPWM.SetOne(1, 60); ShiftPWM.SetOne(2, 90); ...

11111111 11111110 11111100 11111000 11110000 11100000 11000000 10000000 0

and then repeats.

So the two instructions in ASM seem to be working properly, which is good. CMP + RRX do the same thing on arm as CP + ROR do on avr, with the difference being RRX operates on a 32bit register where ROR operates on an 8bit register, so you just have to shift it down after you put all the bits in.

My code is using the SPI module, so that probably attributes some of the extra load, as SPI becomes completely serial, waiting until the send finishes before calculating the next byte, although I only see this being a real issue with very long chains of registers.

I will see about contributing my code back with compile time switches, but it could get pretty messy the way its structured now.

It might be best to move all the arch specific things into precompiler defines and then into arm.h and avr.h, and then conditionally include those in ShiftPWM.h or something.

I am not entirely sure where to get the information for pins_arduino_compile_time.h, as it doesn't seem to line up with pins_arduino.h, but maybe I haven't looked long enough. For me I was using SPI anyway, so using digitalWrite on only the latch pin didn't seem like a huge deal.

perhaps you could take a look at the teensy 3.0 hardware definitions from teensyduino and construct the entries for pins_arduino_compile_time.h, and I will test them?

— Reply to this email directly or view it on GitHubhttps://github.com/elcojacobs/ShiftPWM/issues/1#issuecomment-19108799 .

elcojacobs avatar Jun 07 '13 16:06 elcojacobs

I just looked into it, and digitalWriteFast is included for teensy 3.0 as part of the standard library installed by teensyduino.

couldnt you also just cache the return of digitalPinToBitMask and digitalPinToPort for each port when starting up?

I guess this would be slightly more clock cycles when accessing, by a few.

I will try to throw together a PR for teensy 3.0 this weekend.

apmorton avatar Jun 07 '13 19:06 apmorton

The point is knowing the registers at compile time, because the compiler can then use a bitSet instruction instead of using the mask and port register.

elcojacobs avatar Jun 08 '13 17:06 elcojacobs

It is likely I will not have the time to integrate my solution into ShiftPWM, so I have decided to just release my crap as-is, and hopefully someone with enough time will be able to integrate it.

https://github.com/Juvenal1228/T3ShiftPWM

For my application, all of the logic is in my own code, so I didn't really implement any of the convenience functions your library has to set RGB values etc.

my code just operates directly on m_values

apmorton avatar Jun 16 '13 03:06 apmorton

If anyone's still watching this old thread, I recently ported ShiftPWM to Teensy 3.0 & 3.1.

https://github.com/PaulStoffregen/ShiftPWM

PaulStoffregen avatar Jun 12 '14 02:06 PaulStoffregen

@PaulStoffregen I only looked at your code quickly, so forgive me if I missed something.

It doesn't look like you implemented add_one_pin_to_byte for ARM, which means your implementation cannot use the hardware SPI method.

This is the macro which works with the teensy arm assembly, taken from my T3ShiftPWM repo.

#define add_one_pin_to_byte(sendInt, thres, ptr) { \
    unsigned char val = *ptr; \
    asm volatile("cmp %0, %1" :: "r" (thres), "r" (val) :); \
    asm volatile("rrx %0, %1" : "=r" (sendInt) : "r" (sendInt) :); \
}

https://github.com/apmorton/T3ShiftPWM/blob/master/T3ShiftPWM.h#L14

The performance difference for my application was definitely noticeable between the two methods.

apmorton avatar Jun 30 '14 17:06 apmorton

I'm currently working on an API for SPI transactions. Details can be found on these 2 conversations:

http://forum.pjrc.com/threads/25582-SPI-sharing-between-interrupts-and-main-program/page2

https://groups.google.com/a/arduino.cc/forum/#!topic/developers/TuZLfjeZjDI

SPI transaction support is particularly relevant for libraries like ShiftPWM, which use the hardware SPI port from interrupt context. If a non-interrupt library like Ethernet or SD is using the SPI port at the moment ShiftPWM gets an interrupt, ShiftPWM will end up using the SPI port while that other library still has its chip select active and some other device listening. That's a very bad result.

This same problem happens on all Arduino boards. It's not unique to Teensy.

I'm working on a revised SPI library to solve this problem. Details are in those lengthy conversations.

At the moment, I'm intentionally stalling work on SPI-based libraries, until this SPI transaction stuff is working.

PaulStoffregen avatar Jun 30 '14 23:06 PaulStoffregen