mksocfpga hm2 read(), write() functions need signficant time

observed running irqtest.hal, on terasic de0, times are in nS:

halcmd: show funct
Exported Functions:
  Comp   Inst CodeAddr         Arg              Users Type    Name                              Runtime    Maxtime
    66        00000000b6d3fe1d 0000000000000000     0 user    delinst                                 0          0
    75     76 00000000b53c06d9 0000000000041bc8     0 xthread hm2_de0n.0.pet_watchdog                 0          0
    75     76 00000000b53a9e39 0000000000041bc8     1 xthread hm2_de0n.0.read                     59193     107665
    75     76 00000000b53a9fc1 0000000000041bc8     1 xthread hm2_de0n.0.read_gpio                11070      34502
    75     76 00000000b53c3291 0000000000041bc8     1 xthread hm2_de0n.0.waitirq                 861718     999784
    75     76 00000000b53a9ef1 0000000000041bc8     1 xthread hm2_de0n.0.write                    57493     133106
    75     76 00000000b53a9ff5 0000000000041bc8     1 xthread hm2_de0n.0.write_gpio               21101      53182
    66        00000000b6d3fcfd 0000000000000000     0 user    newinst                                 0          0
  1099   1100 00000000b53858d1 00000000b6546820     1 xthread rate                                 1310      23361
  1110   1111 00000000b5374891 00000000b6546848     1 xthread servo-period                          970      25661
  1118   1119 00000000b5363969 00000000b6546858     1 xthread sp-stats                             1420      26741
  1085        00000000b5396be1 00000000b65467e0     1 thread  test.update                          2830      24781

read() and write() are almost 60uS - together that's 120uS or 12% of the cycle budget at 1kHz; the waitirq() time basically shows the budget left for motion, kins etc

at 5kHz we have only about 50uS left for actual work:

halcmd: show funct
Exported Functions:
  Comp   Inst CodeAddr         Arg              Users Type    Name                              Runtime    Maxtime
    66        00000000b6d3fe1d 0000000000000000     0 user    delinst                                 0          0
    75     76 00000000b53c06d9 0000000000041bc8     0 xthread hm2_de0n.0.pet_watchdog                 0          0
    75     76 00000000b53a9e39 0000000000041bc8     1 xthread hm2_de0n.0.read                     55762     107665
    75     76 00000000b53a9fc1 0000000000041bc8     1 xthread hm2_de0n.0.read_gpio                10461      41402
    75     76 00000000b53c3291 0000000000041bc8     1 xthread hm2_de0n.0.waitirq                  49953     999784
    75     76 00000000b53a9ef1 0000000000041bc8     1 xthread hm2_de0n.0.write                    53602     133106
    75     76 00000000b53a9ff5 0000000000041bc8     1 xthread hm2_de0n.0.write_gpio               20501      56482
    66        00000000b6d3fcfd 0000000000000000     0 user    newinst                                 0          0
  1099   1100 00000000b53858d1 00000000b6546820     1 xthread rate                                  990      27521
  1110   1111 00000000b5374891 00000000b6546848     1 xthread servo-period                          950      32741
  1118   1119 00000000b5363969 00000000b6546858     1 xthread sp-stats                             1250      26741
  1085        00000000b5396be1 00000000b65467e0     1 thread  test.update                          2321      33982

this suggests looking into where the time is spent

Aug 20 '16 19:08 mhaberler

hm, the above samples were from the RIP build which is compiled with -O0 so gdb doesnt stumble across reordered code

with -O2 (as with the package build) things look a bit better (at 5Khz):

halcmd: show funct
Exported Functions:
  Comp   Inst CodeAddr         Arg              Users Type    Name                              Runtime    Maxtime
    66        00000000b66eeba9 0000000000000000     0 user    delinst                                 0          0
    75     76 00000000b562f369 000000000003bbc8     0 xthread hm2_de0n.0.pet_watchdog                 0          0
    75     76 00000000b5620db9 000000000003bbc8     1 xthread hm2_de0n.0.read                     36661      99374
    75     76 00000000b5620ced 000000000003bbc8     1 xthread hm2_de0n.0.read_gpio                 4301      47512
    75     76 00000000b5630f41 000000000003bbc8     1 xthread hm2_de0n.0.waitirq                 125696    1041507
    75     76 00000000b5620d09 000000000003bbc8     1 xthread hm2_de0n.0.write                    33001     114825
    75     76 00000000b5620cc5 000000000003bbc8     1 xthread hm2_de0n.0.write_gpio                7861      52882
    66        00000000b66eedc5 0000000000000000     0 user    newinst                                 0          0
  1099   1100 00000000b54dd651 00000000b6668820     1 xthread rate                                  770      42592
  1110   1111 00000000b54cc651 00000000b6668848     1 xthread servo-period                          850      28551
  1118   1119 00000000b54bb6c1 00000000b6668858     1 xthread sp-stats                             1220      43932
  1085        00000000b54ee8d9 00000000b66687e0     1 thread  test.update                          2050      42902

played with perf top:

  67.44%  [kernel]                  [k] arch_cpu_idle
   3.85%  [kernel]                  [k] _raw_spin_unlock_irq
   2.16%  hm2_soc_ol.so             [.] hm2_soc_read
   2.07%  hostmot2.so               [.] hm2_tram_read
   1.71%  [kernel]                  [k] _raw_spin_unlock_irqrestore
   1.29%  hm2_soc_ol.so             [.] hm2_soc_write
   1.27%  hostmot2.so               [.] hm2_ioport_update
   1.13%  hostmot2.so               [.] hm2_tram_write
   0.80%  hostmot2.so               [.] dummy_queue_read
   0.71%  hostmot2.so               [.] hm2_ioport_gpio_read
   0.70%  hostmot2.so               [.] hm2_encoder_process_tram_read
   0.66%  hostmot2.so               [.] hm2_ioport_gpio_prepare_tram_write
   0.64%  hostmot2.so               [.] hm2_stepgen_prepare_tram_write
   0.50%  [kernel]                  [k] __apbt_read_clocksource
   0.50%  hostmot2.so               [.] hm2_ioport_gpio_write
   0.49%  hostmot2.so               [.] hm2_watchdog_write
   0.44%  hostmot2.so               [.] hm2_sserial_read_pins
   0.43%  hostmot2.so               [.] hm2_sserial_prepare_tram_write
   0.37%  hostmot2.so               [.] hm2_ioport_gpio_process_tram_read
   0.37%  hostmot2.so               [.] dummy_queue_write
   0.31%  hostmot2.so               [.] hm2_stepgen_write
   0.30%  [kernel]                  [k] ktime_get_ts64

I'll have a look at the hm2_soc_read and hm2_soc_write functions tomorrow. I think this byte-wise reading is not necessary.

Aug 20 '16 22:08 mhaberler

If you want to burst read we need to make sure the firmware can handle it. For example, I don't think the AXI lite interface I implemented can do burst reading out of the box.

At the driver level they implement a translation ram, I think which is an attempt to avoid calling write repeatedly in the driver level. We could expose a function that reads the entire range specified by the tram instead of individual reads and writes if you think it's function call overhead from hostmot2 down to hm2 llio.

Aug 20 '16 22:08 dkhughes

One other thing to note:

The AXI bus and avalon bus are clocked much slower than the processor. For instance, I can't go any faster than 100MHz on the bus without timing closure issues. That's probably what is causing the hold up. I'm not sure what the avalon bus is clocked at, but I imagine it is something similar.

Aug 20 '16 22:08 dkhughes

batching might help, yes - not sure how invasive that would be

for now I concentrate on dumbing-down - replacing these supersmart loops by memcpy (which is super-optimized anyway) shaves another 10-20% off hm2_soc_read/hm2_soc_write..

so with that we're in the 70uS ballpark for all four functs - that's ok for now

Aug 20 '16 22:08 mhaberler

Wow, that's a lot. With compiler optimizations turned on? The loops were a guard of some kind to protect misaligned reads, but I don't think they're entirely necessary. I'm a fan of memcpy replacement..

Aug 20 '16 22:08 dkhughes

yes, -O2 (default)

Aug 20 '16 22:08 mhaberler

perf top shows the candidate list to look at - yes, streamlining hm2_tram_* looks promising:

   2.00%  hostmot2.so               [.] hm2_tram_read
   1.32%  hostmot2.so               [.] hm2_ioport_update
   1.24%  hostmot2.so               [.] hm2_tram_write
   0.70%  hostmot2.so               [.] hm2_encoder_process_tram_read
   0.70%  hostmot2.so               [.] hm2_ioport_gpio_read
   0.65%  hostmot2.so               [.] hm2_stepgen_prepare_tram_write

perf is a great tool, recommended (I use linux-perf-4.6 from jessie-backports on the Altera)

Aug 20 '16 22:08 mhaberler

Is it necessary to call both ioport_update and tram_read / write? I think that just duplicates the same access to the hm2 firmware.

Aug 20 '16 22:08 dkhughes

This is what I'm driving at https://github.com/machinekit/machinekit/blob/master/src/hal/drivers/mesa-hostmot2/hostmot2.c#L78. The read/write functions automatically pull/push in the GPIO status.

Is this in your config?

addf hm2_de0n.0.read           servo
addf hm2_de0n.0.read_gpio      servo

addf test.update               servo
addf rate              servo

addf hm2_de0n.0.write          servo
addf hm2_de0n.0.write_gpio     servo

Would that be a duplicate GPIO read write per cycle to the firmware?

Aug 20 '16 23:08 dkhughes

yes, here

not sure, maybe @cdsteinkuehler or PCW know more? or Michael @micges for sure?

Aug 20 '16 23:08 mhaberler

Yeah, I'm willing to bet that's a duplicate interaction. I don't call the gpio directly in the configs because they are updated automatically. I think the gpio update functions are there if you don't need all of the other stuff hm2 provides and want to save the read time because of the slower bus speeds I mentioned (or even worse, bus speed across slow ethernet).

Maybe try removing those two calls. The other thing I do is remove any extra modules from the config firmware, which reduces the amount of data passed back and forth across the slower buses. E.g, instead of 10 stepgens, only instantiate 4, 5, or 6.

Aug 20 '16 23:08 dkhughes

well, our website has the answer..

hm2_<BoardType>.<BoardNum>.read_gpio

Read the GPIO input pins. Note that the effect of this function is a subset of the effect of the .read() function described above. Normally only .read() is used. The only reason to call this function is if you want to do GPIO things in a faster-than-servo thread. (This function is not available on the 7i43 due to limitations of the EPP bus.)

hm2_<BoardType>.<BoardNum>.write_gpio

Write the GPIO control registers and output pins. Note that the effect of this function is a subset of the effect of the .write() function described above. Normally only .write() is used. The only reason to call this function is if you want to do GPIO things in a faster-than-servo thread. (This function is not available on the 7i43 due to limitations of the EPP bus.)

summary: if one calls hm2....read, hm2...read_gpio is not needed, same for write/write_gpio

Aug 21 '16 07:08 mhaberler

it's interesting to compare the IRQ- and non-IRQ versions of perf output

perf_4.6 top -d 10, 5kHz: current hm2 funct usage is at ca 60uS so doable I'd say

nevertheless a couple of things can be read out of the below stats:

there is significant overhead through the uio driver: I think the _raw_spin_unlock_irq and _raw_spin_unlock_irqrestore stem from uio_read/uio_write functions - if one could waste a core for an RT thread, a polling solution would incur less overhead
hm2_tram_read/write are the prime candidates - it looks like they are O(n) with the number of pins+instances so cutting down on unneeded I/O would help
the good news is that using rtapi_get_time() in thread_task has no significant impact on the budget (_rtapi_get_time + sys_clock_gettime + ktime_get_ts64 + __apbt_read_clocksource)

IRQ version:

Samples: 41K of event 'cpu-clock', Event count (approx.): 10370343750
Overhead  Shared Object       Symbol
  51.85%  [kernel]            [k] arch_cpu_idle
   5.89%  [kernel]            [k] _raw_spin_unlock_irq
   4.10%  hostmot2.so         [.] hm2_tram_read
   3.77%  [kernel]            [k] _raw_spin_unlock_irqrestore
   2.87%  hostmot2.so         [.] hm2_tram_write
   1.38%  hostmot2.so         [.] hm2_ioport_update
   1.27%  hostmot2.so         [.] hm2_ioport_gpio_prepare_tram_write
   1.20%  libc-2.19.so        [.] 0x0005f0ec
   1.14%  hostmot2.so         [.] hm2_encoder_process_tram_read
   1.13%  libc-2.19.so        [.] 0x0005f0e4
   1.08%  hm2_soc_ol.so       [.] hm2_soc_read
   1.06%  hm2_soc_ol.so       [.] hm2_soc_write
   0.90%  hostmot2.so         [.] hm2_stepgen_prepare_tram_write
   0.89%  [kernel]            [k] __apbt_read_clocksource
   0.86%  hostmot2.so         [.] hm2_sserial_prepare_tram_write
   0.80%  hostmot2.so         [.] hm2_ioport_gpio_process_tram_read
   0.79%  libc-2.19.so        [.] 0x0005f0d4
   0.75%  hostmot2.so         [.] hm2_stepgen_write
   0.67%  hostmot2.so         [.] hm2_sserial_read_pins
   0.53%  [kernel]            [k] ktime_get_ts64
   0.52%  hostmot2.so         [.] hm2_encoder_update_control_register.isra.0
   0.50%  hostmot2.so         [.] hm2_stepgen_process_tram_read
   0.47%  [kernel]            [k] preempt_count_sub
   0.45%  [kernel]            [k] vector_swi
   0.45%  [kernel]            [k] preempt_count_add
   0.42%  hal_lib.so          [.] thread_task
   0.30%  libc-2.19.so        [.] 0x0005f0c8
   0.30%  [kernel]            [k] migrate_enable
   0.29%  [kernel]            [k] sys_clock_gettime
   0.29%  [kernel]            [k] check_preemption_disabled
   0.29%  hostmot2.so         [.] hm2_encoder_instance_update_position.isra.1
   0.28%  hostmot2.so         [.] hm2_write
   0.27%  [kernel]            [k] rcu_note_context_switch
   0.27%  hostmot2.so         [.] hm2_dpll_write
   0.23%  hostmot2.so         [.] dummy_queue_read
   0.23%  rtapi.so            [.] _rtapi_get_time
   0.22%  [kernel]            [k] rt_spin_lock
   0.21%  [kernel]            [k] schedule
   0.21%  libc-2.19.so        [.] 0x000178e6
   0.19%  hostmot2.so         [.] hm2_encoder_write
   0.19%  [uio]               [k] uio_read
   0.19%  [kernel]            [k] migrate_disable
   0.19%  [kernel]            [k] rcu_idle_exit
For a higher level overview, try: perf top --sort comm,dso

for comparision, without IRQ - task at 5Khz:

Samples: 725K of event 'cpu-clock', Event count (approx.): 112470974030
Overhead  Shared Object             Symbol
  84.83%  [kernel]                  [k] arch_cpu_idle
   6.69%  hostmot2.so               [.] hm2_encoder_process_tram_read
   1.92%  hostmot2.so               [.] hm2_encoder_instance_update_position.isra.1
   1.67%  hostmot2.so               [.] hm2_stepgen_process_tram_read
   0.99%  hostmot2.so               [.] hm2_resolver_process_tram_read
   0.36%  hostmot2.so               [.] hm2_read
   0.28%  hostmot2.so               [.] 0x000017e4
   0.21%  librtapi_math.so.0        [.] rtapi_fabs
   0.18%  hostmot2.so               [.] 0x000017e8
   0.10%  libc-2.19.so              [.] strcmp
   0.10%  hostmot2.so               [.] __floatdidf
   0.08%  hostmot2.so               [.] __aeabi_dadd
   0.08%  libc-2.19.so              [.] memset
   0.05%  libpthread-2.19.so        [.] __pthread_rwlock_unlock
   0.05%  perf_4.6                  [.] 0x00090708
   0.05%  perf_4.6                  [.] 0x000b7f2c
   0.04%  perf_4.6                  [.] 0x00093f00
   0.03%  perf_4.6                  [.] 0x00093f02
   0.03%  [kernel]                  [k] _raw_spin_unlock_irq
   0.03%  perf_4.6                  [.] 0x000918fc
   0.03%  libpthread-2.19.so        [.] __pthread_rwlock_wrlock
   0.03%  perf_4.6                  [.] 0x00091d2c
   0.03%  libpthread-2.19.so        [.] pthread_mutex_trylock
   0.03%  libpthread-2.19.so        [.] __pthread_rwlock_rdlock
   0.03%  perf_4.6                  [.] 0x00096a6e
   0.03%  perf_4.6                  [.] 0x00006a5c
   0.02%  [kernel]                  [k] stmmac_mdio_read
   0.02%  perf_4.6                  [.] 0x00096a88
   0.02%  libpthread-2.19.so        [.] pthread_mutex_lock
   0.02%  perf_4.6                  [.] 0x00093efc
   0.02%  perf_4.6                  [.] 0x00096a52
   0.02%  perf_4.6                  [.] 0x00096a6a
   0.02%  perf_4.6                  [.] 0x00093ef0
   0.02%  perf_4.6                  [.] 0x00096a62
   0.02%  hostmot2.so               [.] hm2_ioport_gpio_process_tram_read
   0.02%  perf_4.6                  [.] 0x00096a74
   0.02%  perf_4.6                  [.] 0x0001d35e
   0.02%  libslang.so.2.3.0         [.] SLsmg_write_chars
   0.01%  libavahi-common.so.3.5.3  [.] avahi_unescape_label
   0.01%  perf_4.6                  [.] 0x00091d72
   0.01%  perf_4.6                  [.] 0x00091d78
   0.01%  libpthread-2.19.so        [.] __pthread_mutex_unlock_usercnt
   0.01%  perf_4.6                  [.] 0x00096a50

Aug 21 '16 08:08 mhaberler

hm2_tram_read/write are the prime candidates - it looks like they are O(n) with the number of pins+instances so cutting down on unneeded I/O would help

This aligns with what I've seen in the firmware and the driver code. It makes sense intuitively, the more pins/modules you add, the more you need to swap data back and forth in a linearly increasing fashion.

there is significant overhead through the uio driver: I think the _raw_spin_unlock_irq and _raw_spin_unlock_irqrestore stem from uio_read/uio_write functions - if one could waste a core for an RT thread, a polling solution would incur less overhead

I wonder if this is from the driver or from some interaction with the interrupt controller. For instance, all of these interrupts are shared interrupts. If you have more than one interrupt registered, based on that document you linked from Linus, all registered interrupt handlers get notified, and they check if it is theirs to handle. Perhaps by turning on our uio interrupt, it's showing the additional overhead of that notification framework along with the overhead of the uio driver. That leads me to wonder:

What are the debug config options in your kernel config? Lot's of debugging turned on?

From your testing and my testing, it looks like we sacrifice some processing time for less interrupt jitter - a fair trade in my mind.

Aug 21 '16 11:08 dkhughes

@dkhughes - that is the stock cv kernel - I think perf should work just alike on the zynq

the tradeoff is ok - I'm just exploring options

assume we were to do an alternative polling solution for external timing - let's think through if/how that would be possible. Naive approach:

do NOT enable the IRQ in the DTBO
IRQStatus: set Bit1 "Irq mask" to 1
ClearIRQ: write to clear the IRQ bit
in the wait function, loop until ClearIRQ becomes non-zero, then write to ClearIRQ to reset

does that make sense? would we cause an unclaimed IRQ making the kernel complain ".. nobody cared" ?

Aug 21 '16 11:08 mhaberler

does that make sense? would we cause an unclaimed IRQ making the kernel complain ".. nobody cared" ?

No, that should be totally fine, just don't register the IRQ with the controller. If the interrupt isn't registered it shouldn't cause any trouble. That being said, we probably need to go back to the polling version of hm2_uio driver that was dkms built just to be safe.

Then, we need to poll the IRQ status register bit '0' (hm2's status register, not the ARM interrupt controller) in waitirq in a tight spin loop. When it goes true (or times out) we break out, and write clear interrupt to hm2 firmware just like we are doing now. It will eat an entire thread, but with the 2 cores I think we're fine. We might have our cake and eat it to in that case, since it would skip the uio_irq portion entirely, we just deal directly with the firmware.

Aug 21 '16 11:08 dkhughes

CONFIG_DEBUG_INFO=y

Might be worth a shot of turning down the debug level, and maybe turn on the quiet option.

Aug 21 '16 11:08 dkhughes

The picture is a bit different on the zynq, not sure I see a common pattern yet which warrants action..

enable jessie-backports and install linux-perf

irqtest.hal/5kHz, perf_4.6 top --cpu=0 -d 10

Samples: 1M of event 'cycles:ppp', Event count (approx.): 27839070423
Overhead  Shared Object              Symbol
  19.99%  [kernel]                   [k] cpuidle_enter_state
  15.57%  [kernel]                   [k] _raw_spin_unlock_irq
   2.27%  hm2_soc_ol.so              [.] hm2_soc_write
   2.16%  [kernel]                   [k] tick_nohz_idle_enter
   1.75%  [kernel]                   [k] _raw_spin_unlock_irqrestore
   1.62%  hostmot2.so                [.] hm2_stepgen_instance_prepare_tram_write
   1.52%  hostmot2.so                [.] hm2_ioport_gpio_prepare_tram_write
   1.51%  hostmot2.so                [.] hm2_ioport_gpio_process_tram_read
   1.45%  hostmot2.so                [.] hm2_ioport_update
   1.15%  hm2_soc_ol.so              [.] hm2_soc_read
   1.09%  hostmot2.so                [.] hm2_stepgen_instance_position_control
   1.07%  [kernel]                   [k] run_ksoftirqd
   1.07%  [kernel]                   [k] vector_swi
   1.02%  [kernel]                   [k] walk_stackframe
   0.93%  [kernel]                   [k] preempt_count_add
   0.87%  [kernel]                   [k] sys_clock_gettime
   0.79%  [kernel]                   [k] check_preemption_disabled
   0.76%  hostmot2.so                [.] hm2_stepgen_process_tram_read
   0.76%  [kernel]                   [k] unwind_frame
   0.75%  hostmot2.so                [.] hm2_stepgen_prepare_tram_write
   0.72%  [kernel]                   [k] schedule
   0.71%  [kernel]                   [k] preempt_count_sub
   0.70%  [kernel]                   [k] ktime_get_ts64

Aug 22 '16 14:08 mhaberler

That aligns with what I mentioned before about the slow bus speed at the processor -> fpga interface. The zynq side is configured with a 100MHz axi interface. I'd have to double check, but I think the cv interface is only clocked at 50MHz, which would double your transfer time before any (if there is any) extra overhead incurred by an avalon->axi merge.

Aug 22 '16 14:08 dkhughes

re the tight spin loop.. starvation is always an issue whith an RT thread and spinning, and the cycles are lost for good

one way could be to adopt an old idea of PCW and do the following:

after executing the last funct, calculate how much of the cycle budget is left - we do have the current timestamps already (for example's sake: leftover=300uS)
use a safe upper limit for latency, (for example margin=100uS)
execute a wait for (leftover - margin = 200uS) - that would be cycles gained for other threads
when the wait returns, start spinning for IRQbit set
if the margin is set to -1, never wait=always spin - so we can test how effective this is

Aug 22 '16 15:08 mhaberler

On 8/22/2016 10:23 AM, Michael Haberler wrote:

re the tight spin loop.. starvation is always an issue whith an RT thread and spinning, and the cycles are lost for good

The way to make things fast is to avoid reads.

If it seriously becomes an issue to read the required data from the FPGA (which is not a big stretch, considering how fast processors run these days, and that the processor is pretty much stalled for the duration of the required reads), the hardware needs to initiate writes instead of the CPU doing reads:

Periodic hardware timer tick happens
Hardware latches all required values at this point in time
Hardware performs bus-mastering DMA into main memory, typically via a circular buffer or similar mechanism
Hardware generates an interrupt, letting the CPU and real-time thread know there is new work to be done

The overhead of doing bus-mastering DMA is a pain, but it provides much better performance. Writes are pipelined throughout modern interconnect fabrics, can happen in large bursts, and generally help keep system interconnect throughput high. Reads generally bring everything to a screeching halt for the full bus turn-around time (read requested to data returned).

There are some memory semantics you need to be careful with when doing DMA, but nothing much more than what we already have to do to use circular buffers in HAL on multi-core machines.

For now, switching to memcpy seems like the low-hanging fruit. :)

Charles Steinkuehler [email protected]

Aug 23 '16 02:08 cdsteinkuehler

I don't need the extra performance yet, but I have to say, full blown dma between the host and the fpga sounds like fun. I'd have to read the TRM a lot more for the zynq first...

Maybe after some of the other wishlist things get ticked off...like allowing aux gpio when cutter comp is turned on...

Aug 23 '16 11:08 dkhughes

mksocfpga mksocfpga copied to clipboard

hm2 read(), write() functions need signficant time

mksocfpga
mksocfpga copied to clipboard