trezor-firmware icon indicating copy to clipboard operation
trezor-firmware copied to clipboard

Input methods for model R

Open grdddj opened this issue 3 years ago • 23 comments
trafficstars

Connected with https://github.com/trezor/trezor-firmware/issues/2267:

  • passphrase entry for model R

Trying it locally:

  • TREZOR_MODEL=R make build_unix
  • ./emu.py - does not need any erasing nor loading
  • trezorctl crypto get-entropy 1111 - I could not find a way how to spawn passphrase dialogue via trezorctl, so I have modified this one to do it

image image image image image

grdddj avatar May 19 '22 16:05 grdddj

I changed the base to trezor_r_ui_tests and closed the other PRs PIN screenshots: #2233 Seed backup screenshots: #2264 Seed recovery screenshots: #2268

matejcik avatar May 27 '22 14:05 matejcik

Running trezorctl device recover crashes after confirming the number of words to recover. Not reproducible on the emulator. Additionally, no exception is thrown in /dev/tty*.

sime avatar May 30 '22 14:05 sime

Running trezorctl device recover crashes

Could be the device running out of memory. The current code makes multiple copies of the BIP-39 wordlist. If that's the case, it is going to be fixed by switching to code in #2312

matejcik avatar May 31 '22 11:05 matejcik

Force-pushed to reflect the latest changes in master - most significantly using the bip39 Wordlist

@sime - this PR branch should currently support all your needs

grdddj avatar Jun 02 '22 08:06 grdddj

Working for me.

sime avatar Jun 02 '22 09:06 sime

The last two commits are focused only on code improvements, there are no new features - @sime, no need to build this for the UI/UX testing.

I tried to address most of the comments above, what is still remaining is the ButtonController

grdddj avatar Jun 23 '22 10:06 grdddj

Force-pushed after rebase on master and on https://github.com/trezor/trezor-firmware/pull/2339 - looking forward on that being merged! (I have included that PR as a single squashed commit)

NOTE on https://github.com/trezor/trezor-firmware/pull/2339: Here there are some changes to the HoldToConfirm and Loader code, making both components general with T: AsRef<str>, so they could be used with ButtonController and ButtonContainer, which are also T: AsRef<str>. I think it does not affect that PR, so it could be merged as-is and extended here.

Thanks @TychoVrahe for help with the screen flickering in the last commit, now it seems stable enough. We are currently re-drawing the buttons only when at least one of them changes (I was trying to change them individually, but failed, even with pad on each button and clearing it).

The next step is using the ButtonController in all user-facing UI components, not only in current input methods (PIN, passphrase etc.)

grdddj avatar Jul 08 '22 13:07 grdddj

Force-pushed after rebase on master to have the MacOS nix-shell fix

grdddj avatar Jul 14 '22 10:07 grdddj

Last two commits aim on printing strings from Rust and for more information on panic messages.

Printing works both on emulator and hardware device (in debug mode), but the file and line info about panic (FATAL ERROR) are not included on hardware, as we would overflow flash space (it takes around 10 kb).

As these things can be beneficial even outside of model R, I made a separate PR for them - https://github.com/trezor/trezor-firmware/pull/2391

grdddj avatar Jul 18 '22 15:07 grdddj

Force-pushing to add the recent PR with printing.

Last two commits include the new Unifont font and some changes needed because of this. There are still quite a lot of spaces where the new fonts are creating some trouble, as its size has changed.

As new fonts are also slightly larger in byte-size, currently we are overflowing flash on hardware in debug mode - for 500 bytes it says. Therefore, please build the firmware with BITCOIN_ONLY=1

Some screenshots with the new fonts: image image image

grdddj avatar Jul 21 '22 15:07 grdddj

As new fonts are also slightly larger in byte-size, currently we are overflowing flash on hardware in debug mode - for 500 bytes it says. Therefore, please build the firmware with BITCOIN_ONLY=1

I have an inherit fear that with the fonts will blow away our flash space, please keep an eye on this if it becomes reality.

sime avatar Jul 22 '22 14:07 sime

After the recent commits (dealing mostly with font magnification and some pixel-shiftings), my hardware device is experiencing Internal error (HF) during early stages of recovery. I have not encountered that in any other input methods.

I am investigating it.

grdddj avatar Jul 28 '22 07:07 grdddj

97e8180 addresses the Hard fault during the bip39 recovery process.

A more detailed description of the HF conditions:

  • happened when micropython was requesting multiple words from Rust in a loop (for 12 words it needed to call the request_word_bip39 12 times in a row)
  • first and (mostly) second requests were fine, but the third triggered that HF (almost without exception)
  • trying to isolate the issue uncovered that the main factor in "having HF/not" is the number of items in ChoicePage - with lower numbers, it was possible to call the function repeatedly, with higher amounts it was failing even in the second request

I altered the process of retrieving the bip39 words - all of them are retrieved in just one request, so all the Rust layout things need to be instantiated only once. It means that at least for now the model R and model T have different Rust API (it needs branching the caller code in micropython). Also, it might be fine to do it for BIP39, but we need to support SLIP39 as well and there are some on-the-fly checks for slip39 after each word - https://github.com/trezor/trezor-firmware/blob/426eae4dfcb853041419030bbb104cc2bbf6629f/core/src/apps/management/recovery_device/layout.py#L61, so these things would need to be transferred into Rust if we would also collect all slip39 words at once.


It probably does not address the root issue, but I did not succeed in finding out where exactly is the issue (with some memory profiling etc.).

One thing I could do is to tell exactly which commit made it stop working (it worked reliably before and not worked at all after) - cfc9e2e68 - which adds support for icons as buttons, and therefore probably increases the memory footprint of the ChoiceItems that ChoicePage needs to allocate for.


What I still cannot fully understand is how it can work once or twice and fail the third time of the exact same call. The premise is that the micropython environment is still the same (as all the calls to Rust happen in one loop) - that would signal to me there is no cleaning of the "old" layouts when calling "new" one.

Small experiment:

  • replacing the call to request_word with request_passphrase_on_device also failed at the 2nd/3rd attempt
  • on its own request_passphrase_on_device in a while True triggered HF always after 5 attempts
  • calling it one-by-one being triggered with trezorctl command did not throw HF even after much more attempts
  • that would conclude to me that the homescreen is somehow "resetting/cleaning" the state

grdddj avatar Jul 29 '22 16:07 grdddj

02295a2 contains a functional POC of one component that can satisfy theoretically any step of actions to be done on the screen - and can be defined in a "Rusty" way of chaining the method calls on one object instead of that awkward format string.

There is actually not a lot of new functionality, the biggest differences between this and the previous "format" attempt are that it does not reuse FormattedText, does not use format string, and the Ops are stored directly in the component.

A lot of code in flow_pages_poc_helpers.rs is almost fully copypasted from ui/component/text - so that I could make some small changes there for this POC.

The calling code in layout.rs looks like this:

FlowPageMaker::new::<theme::TRDefaultText>(btn_layout, btn_actions)
    .icon_label_text(
        theme::ICON_USER,
        "Recipient".into(),
        truncated_address.as_ref().into(),
    )
    .newline()
    .icon_label_text(theme::ICON_AMOUNT, "Amount".into(), amount.as_ref().into())


to create two common patterns of icon, offset, normal-text label, newline and bold-text text

or

FlowPageMaker::new::<theme::TRDefaultText>(screen.2.clone(), screen.3.clone())
    .text_bold(screen.0.into())
    .newline()
    .text_normal(screen.1.into())

to just have some normal text in combination with bold text

grdddj avatar Aug 08 '22 21:08 grdddj

9542554 just makes it running on the hardware device. By previously allocating the same fixed amount (in Vec<Op<String<100>>, 30>) for each FlowPage, it failed to allocate space for device tutorial, which has 8 flow pages (a lot).

Temporarily solved by each Flow setting the suitable numbers for itself - a page with a small number of operations but longer text can choose for example Vec<Op<String<120>>, 8> and a page with shorter texts but more operations can define Vec<Op<String<25>>, 25>.

It creates little bit of a mess with those const X: usize, as Flow currently looks like Flow<T, const N: usize, const M: usize, const S: usize>, as it also needs to choose the amount of the FlowPages.

That is definitely not the cleanest and may be dangerous for rendering some outside-world strings, where somebody (some new cryptocurrency) would for example send 101 characters long address and we would only allocate String<100> (we could check the length before making a String out of it theoretically, but still not best).

It is also not very nice that the high-level code in layout.rs must be choosing these constants. We could create some components/helpers/primitives that would handle the allocation for the common tasks that we have - for example pairs of icon+label+text, but then we would need to support one pair, two pairs, three pairs...

I guess I am starting to almost dislike this one component for all screens, mostly because of the needed allocation issues, which can be really dangerous. I just cannot see another way right now how to generally support the use case "I want to place icons, I want to draw text and I do not know if it will fit one screen, therefore I need pagination".

PS: why do I use Op::Text(String<N>) instead of something else like Op::Text(StrBuffer), which would be much more memory-efficient? (Am I right Imagining that in that case, the Op enum would not need almost any allocation, as StrBuffer is just a pointer to already existing data?) Currently for handling the page-break we need to manipulate the boundary text - when starting a new page, we need to cut out the already shown part and preserve only the new text-to-be-rendered. But maybe the component can store all the operations with StrBuffer and only create String Ops (or some other objects) for the sake of rendering the current page - here the concept of Line would be nice...

grdddj avatar Aug 09 '22 07:08 grdddj

Replacing String<N> by StrBuffer in 5f80757 makes me much more comfortable with it. Instead of modifying the Op::Text on the page break, I am setting the length parameter of how much is left to display from that StrBuffer - and it means the text content itself stays the same all the time, and we can still display slices of it of arbitrary length.

grdddj avatar Aug 09 '22 08:08 grdddj

In 5ff1e44 I am adding the possibility to get all the information from Rust layouts in micropython - via sending the Trace and processing it into useful data.

This is a starting point to implementing the automatic pagination and also will be useful for automatic UI tests.

Below is the example of the data that we have access to in micropython and can act upon it - the page indexes, which button goes next, screen text, button actions etc.

image

grdddj avatar Aug 11 '22 15:08 grdddj

Current status regarding flash space:

  • non-debug build is overflowing .flash for 23 kb
  • debug build overflows for 41 kb

Such a big difference was probably caused by the recent impl Trace, which is ui_debug-only. We are probably still fine, however, as long as we manage to split the code into .flash2, where we should have enough space.

Estimate of free .flash2 space (for model R):

  • currently there are 50kb free
  • there is roughly 10-20 kb of micropython UI code that still gets included, so we can get rid of these (granted we have the functionality already implemented in Rust)
  • if we get rid of the ETH definitions, that could save 50-70 kb

Considering this, even the missing 41kb for debug build do not look so bad.

grdddj avatar Aug 11 '22 17:08 grdddj

19e49d7 adds two CI dashboards for quickly checking all unique screens and all testcases in one place. These two dashboards are also linked to each other by clicking the specific images.

grdddj avatar Aug 15 '22 17:08 grdddj

Force-pushed after rebase on master so that I have the latest success animation changes there. Also changed the base branch to master, effectively merging https://github.com/trezor/trezor-firmware/pull/2252 into this PR as well.

The rebasing was not straightforward, so there is a possibility of some bug being introduced in it. I have tested some basic workflows on a hardware device and they were fine. For the user-testing, @sime, if there would be any issues on this "new" rebased version, the last "old" CI is this - so you can get the firmware from there, just in case (it already includes the double-PIN screen).

grdddj avatar Aug 16 '22 10:08 grdddj

Force-pushed after rebase on master to include some latest changes.

What is unfortunate now is that on device almost all more advanced workflows (apart from seed backup) are throwing Internal error (SO) - in both the debug and nondebug mode. I am looking into it...

grdddj avatar Sep 02 '22 08:09 grdddj

Stack overflow issues solved by lazy-loading the pages - ChoicePage requesting only what is needed right now, so just the current page needs to be stored on the stack.

Using

pub trait ChoiceFactory {
    fn get(&self, choice_index: u8) -> ChoiceItem;
    fn count(&self) -> u8;
}

to transmit information about all the screens - each specific input (PIN , bip39 etc.) implements this and returns the specific ChoiceItem according to the current choice_index .

Similar was done for the Flow pages - in layout.rs we are defining a closure function specifying what Page corresponds to a specific index - without creating a heapless::Vec which caused the SO.


5125666 is introducing a way to get object sizes on the fly using the debug build and gdb.

It was created so that we can verify how much memory (stack) space all the layouts (and Rust code in general) are occupying, so we can avoid the SO issues.

From the results below it is apparent that the lazy-loading approach to page rendering works, the object sizes are much smaller than with the long heapless::Vecs with instantiated objects.

Long results with object/layout sizes

(some things like FormattedText or Paragraphs depends quite a lot on the input data - but others are fairly constant)

ChoicePage: 1079
ChoicePage.choices: 1
ChoicePage.pad: 20
ChoicePage.buttons: 1049
ChoicePage.page_counter: 1
ChoicePage.is_carousel: 1

ButtonPage: 1828
ButtonPage.content: 392
ButtonPage.scrollbar: 40
ButtonPage.pad: 20
ButtonPage.cancel_btn_details: 79
ButtonPage.confirm_btn_details: 79
ButtonPage.back_btn_details: 79
ButtonPage.next_btn_details: 79
ButtonPage.buttons: 1049

PassphraseEntry: 1147
PassphraseEntry.choice_page: 1079
PassphraseEntry.show_plain_passphrase: 1
PassphraseEntry.textbox: 64
PassphraseEntry.current_category: 1
PassphraseEntry.menu_position: 1

Paragraphs: 392
Paragraphs.area: 16
Paragraphs.list: 344
Paragraphs.placement: 8
Paragraphs.offset: 16
Paragraphs.visible: 8

FormattedText: 480
FormattedText.layout: 40
FormattedText.fonts: 16
FormattedText.format: 16
FormattedText.args: 200
FormattedText.icon_args: 200
FormattedText.char_offset: 8

SimpleChoice: 1254
SimpleChoice.choices: 88
SimpleChoice.choice_page: 1166

PassphraseEntry: 1147
PassphraseEntry.choice_page: 1079
PassphraseEntry.show_plain_passphrase: 1
PassphraseEntry.textbox: 64
PassphraseEntry.current_category: 1
PassphraseEntry.menu_position: 1

PinEntry: 1161
PinEntry.show_real_pin: 1
PinEntry.textbox: 64
PinEntry.choice_page: 1094

Bip39Entry: 1950
Bip39Entry.choice_page: 1374
Bip39Entry.letter_choices: 112
Bip39Entry.textbox: 32
Bip39Entry.pad: 20
Bip39Entry.offer_words: 1
Bip39Entry.bip39_words_list: 16
Bip39Entry.words: 392
Bip39Entry.word_count: 1

Flow: 2102
Flow.pages: 1
Flow.page_counter: 1
Flow.pad: 20
Flow.common_title: 24
Flow.current_page: 984
Flow.buttons: 1049

Page: 984
Page.ops: 608
Page.layout: 40
Page.btn_layout: 239
Page.btn_actions: 72
Page.current_page: 8
Page.page_count: 8
Page.char_offset: 8

ButtonController: 1046
ButtonController.pad: 20
ButtonController.left_btn: 331
ButtonController.middle_btn: 331
ButtonController.right_btn: 331
ButtonController.state: 2
ButtonController.button_area: 16

ButtonContainer: 331
ButtonContainer.button: 121
ButtonContainer.hold_to_confirm: 113
ButtonContainer.pos: 1
ButtonContainer.button_type: 1
ButtonContainer.btn_details: 79
ButtonContainer.htc_triggered: 1

grdddj avatar Sep 04 '22 19:09 grdddj

Updated flash sizes for Rust here

It currently has 77 kb in debug mode, which is not little.

Quite a lot (almost 15 kb) is occupied by Trace and its implementations from all objects.

Some increases could also be seen due to SO mitigation. Instead of creating the heapless::Vec with objects on stack, the logic to create pages on demand is offloaded into functions living in flash space (obvious side-effect).

grdddj avatar Sep 05 '22 08:09 grdddj

Replaced by https://github.com/trezor/trezor-firmware/pull/2610

grdddj avatar Feb 03 '23 15:02 grdddj