authenticator-rs icon indicating copy to clipboard operation
authenticator-rs copied to clipboard

[WIP] Add basic NFC support

Open Crote opened this issue 4 years ago • 20 comments

Recently I bought a fancy ACR122U NFC reader because I got tired of plugging in my Yubikey all the time. To my great surprise, Firefox didn't support Webauthn via NFC yet. So I took it upon myself to try implementing this myself as #51 isn't really showing a lot of progress :wink: .

About the implementation: The old library was tightly coupled to wrapping all APDU command in a HID layer. This has been reworked in a backend-agnostic variant, making it possible to add new communication methods. The NFC implementation is done using pcsc-rust, which should provide bindings on Windows, Linux, and MacOS. To use NFC, use --features nfc. This will also disable USB support for now. This is my first time writing Rust, so there's probably a lot of improvement to be made style-wise.

Work remaining:

  • The implementation currently provides an interface similar to the OS-agnostic one, as StateMachine only allows for a single Transaction backend at the moment. This means that enabling NFC currently disables USB, so StateMachine should be reworked.
  • The backend-agnostic interface has only been implemented for Linux at the moment. The other platforms must also be converted. This should be relatively easy, but I don't have access to those platforms so I left that for a later point in time.
  • This breaks a lot of tests. I'm deferring a rewrite of them as the current code structure is just a first draft.

Open issues:

  • It seems that there is no way to easily distinguish between a USB-attached smartcard and an NFC reader. This means that we will also try to connect to a USB-attached U2F device as if it is attached via NFC. This will, of course, not work. It seems to be harmless, but perhaps it could interfere with the proper HID implementation, so we should probably try to prevent this from happening.

Crote avatar Jun 27 '20 14:06 Crote

Thank you for this, @Crote! I'm happy to start reviewing this for inclusion into Firefox when you think it ready. I'm expecting to start work in earnest on authenticator-rs improvements here in late July myself, for reference.

jcjones avatar Jun 29 '20 15:06 jcjones

I'm doing a re-layout right now, and probably going to end up splitting out platform code into code for USB HID vs other transports. I can certainly take on helping to rebase this patch if you need the assistance, but apologies in advance!

jcjones avatar Jul 27 '20 17:07 jcjones

Actually, that's the very reason why it stalled: I made an attempt to support multiple transports at the same time, but it quickly turned into a complete rewrite of basically the whole architecture. As a novice Rust programmer, that turned out to be way above my pay grade.

I'll look into it some more in the coming week!

Crote avatar Aug 02 '20 14:08 Crote

Probably won't be ready this week. Hopefully next week though! :)

jcjones avatar Aug 02 '20 21:08 jcjones

https://github.com/mozilla/authenticator-rs/pull/123 is the sketch of how I think this is going to work. Note that I also want to rework the directory hierarchy (#124), but I don't want to do that until my set of re-arch changes in #121, #122, #123 are merged.

jcjones avatar Aug 05 '20 22:08 jcjones

I've merged the auth_service branch, so if you'd like to look at what this might look like as its own subfolder, feel free! I still want to do the mass rename of the u2f code, which will necessarily move the HID stuff away from just src, but for now you could just assume it's there and we refactor accordingly later.

jcjones avatar Aug 19 '20 19:08 jcjones

There are still a few issues left to be worked out, but most of the big stuff is done. The crucial part for now is the refactoring of the code which previously was USB-only. I've tried to avoid altering the existing code too much, but some pretty dramatic changes were required to make some of that code reusable.

Could you give your thoughts on the overall structure so far? I've tried to separate the refactoring into logical commits for easy reviewing.

Crote avatar Aug 26 '20 12:08 Crote

I will review today! Thanks!

jcjones avatar Aug 26 '20 16:08 jcjones

(oops, hit enter too soon)

I haven't done the nitty-gritty review here, but I'm pleased with the refactoring around the device info trait, and the rework of how the state machines start and the APDU movement. I just want to run through those with the proverbial fine comb.

Also I am not sure what the best testing strategy for this will be. Do you know of any virtual NFC devices we could fake pcsc to use? Or are we similarly stuck there like we are with USB?

jcjones avatar Aug 28 '20 22:08 jcjones

Yeah, it's turning into a bit of a mess. The term "U2F" is starting to lose all meaning to me. It feels like there's a lot of cleanup possible with manager.rs / monitor.rs / transaction.rs, but I haven't looked too deep into it.

There's still an issue in the way the NFCManager works though, as it seems that the old transaction is never actually cancelled. There seems to be some sort of race condition there or something.

I'm not that familiar with testing this. We wouldn't even need a virtual NFC device, any virtual smart card reader would be fine. pcsc-lite has a pscs-spy tool allowing you view the communication with the reader, but that would still require something to actually talk to. Then again, my entire knowledge of this stuff comes from doing research to make this PR in the first place, so perhaps I've just missed it.

Crote avatar Aug 29 '20 19:08 Crote

@jcjones I've been working on it again after some time off.

Basically, the first five commits are done and should be merge-able.


I've got NFC just about working, but I'm running into some bigger architectural issues. Mainly, the way the callbacks are structured.

Right now, there's a status callback, and the StateCallback. The latter is quite finicky to work with due to the whole observer stuff. The observer only fires on the original, so any clone won't result in a transport cancel. Besides not being documented at all, it doesn't make sense to me behaviour-wise.

Ideally, I'd say there are three kinds of callbacks:

  1. A success. The obvious one, should result in a cancel on all transports. This should only happen once, of course.
  2. Initiation errors. When a request is malformed, or no transport is available. This should result in a termination and no further callbacks will happen.
  3. Timeouts. They are currently considered fatal errors. For example, the Linux USB backend returns a U2FTokenError when this happens. This should just be a, well, timeout. Nothing more than a "you didn't get a response in time, so you'll never get one" and definitely not an error.
  4. Temporary errors. This is the challenging one. What if a device returns a corrupt message? Or initiating a single transport results in an error? Or some other weird transient error because, well, we're dealing with hardware. This should just result in a warning. Sure, they're errors, but they might just go away if the user tries again. Currently they are basically considered fatal errors, yet they are anything but. Ideally you'd want to just handle them as some kind of warning status message, but the current infrastructure makes that quite hard.

So basically, the current division between status- and state-callback make it impossible to properly handle the type of events which actually happen, and they make the proper cancelling of transports extremely difficult.

What are your thoughts on this? Should I rework it to use a single callback and overhaul all the error types? Is there something I'm missing?

Crote avatar Oct 24 '20 19:10 Crote

Thank you for continuing to work on this! I'm taking holiday this week, but I didn't want to leave this with no reply, so here's a shorter-ish reply (I've been thinking this through all weekend).

I think you're right that we need multiple callbacks. I'm not totally clear on your suggestion of what the third one to add would be, though.

We currently have a "completion" callback that indicates success, timeout, or critical error. We also have a status callback that indicates devices coming and going. I think I understand you're suggesting adding a third callback for transient errors that don't halt.

If I'm correct in that, it seems like that could be just another condition type added to the status callback, at least from the Rust side, since the callback enum objects can be made to contain all the necesssary data. I agree, however, that the C API would require a separate callback.

Assuming the above to all be correct, I'm fine with either plumbing a third channel as a non-critical error message channel, or adding to the existing status channel type.

I was pondering this exact situation with the CTAP2 PIN protocol, actually, and imagining adding to the status channel, something like:

pub enum StatusUpdate {
    DeviceAvailable { dev_info: u2ftypes::U2FDeviceInfo },
    DeviceUnavailable { dev_info: u2ftypes::U2FDeviceInfo },
    Success { dev_info: u2ftypes::U2FDeviceInfo },
+  Condition { dev_info: u2ftypes::U2FDeviceInfo, details: TemporaryCondition },
}

+ pub trait TemporaryCondition {
+    // ... stuff
+ }

I think that would end up being easier than overhauling the error types, but I haven't figured out what traits such an object would have. Then again, maybe it's better to just have more StatusUpdate types, like one for TransientError and one for PINInvalid and one for PINUnset and so on.

Gotta run to do holiday things... thank you again @Crote, I am really looking forward to merging this into Firefox!

jcjones avatar Oct 26 '20 17:10 jcjones

That's not quite what I meant. The main issue I ran into, is that there are multiple callbacks to begin with. Instead, I'd suggest merging them into a single callback (at least on the Rust side), with a more comprehensive callback object.

Something like this makes the most sense to me:

enum StatusEvent {
  DeviceAvailable { ... },    // New device found
  DeviceUnavailable { ... },  // Connection to old device lost
}

enum ResultEvent {
  Success { ... },            // Everything went well!
  Timeout,                    // Too much time elapsed, so we aborted
  Unable,                     // Turns out we have no viable transports,
                              // so a successful result is impossible
}

enum ErrorEvent {
  // All but the last case could be transient
  // The user should probably be prompted to
  // retry the device which caused the error
  U2FError { U2FTokenError }, // We successfully talked to a device using U2F,
                              // but we used the wrong command in this state or something
  ProtocolError { ... },      // We tried talking to a device, but we got garbage we can't decode
  CommunicationError { ... }, // The OS gave us a device, but we got an error trying to use it.
                              // Maybe a write to a file descriptor failed or something? 
  TransportError { ... },     // We tried using a transport, but it gave an error.
                              // Maybe we got a permissions error trying to enumerate devices or something?
                              // In any case, this transport is unusable now.
}

enum AuthEvent {
  Update { StatusEvent },     // Something changed, but we can safely ignore this
  Warning { ErrorEvent },     // Something went wrong, we should probably tell the user
  Finished { ResultEvent },   // We're done, this is the last callback which will happen
}

And probably a kind of internal error somewhere, but that's not that important right now.

The callback would simply accept an AuthEvent, and pattern matching it would be trivial: the AuthenticatorService can simply cancel all transports on a Finished, all StatusEvents are probably only logged for debugging, ErrorEvents are displayed to the user to tell them to try again, the ResultEvent is a natural result from the whole process - and is the only thing sent into the channel as is currently done in main.rs.

As to the interface on the C side, I don't know enough about it to form a meaningful opinion. But combining two C-side callbacks into a single Rust-side callback should be trivial, right?

My main intention is to have a single, clear, and unambiguous way to tell the caller what is happening. I'm not sure if this approach is the right one, but something in this direction is probably needed.

Enjoy your holiday!

Crote avatar Oct 26 '20 19:10 Crote

Oh, I like that very much. I'll leave more for later, but I would happily handle that patch.

Now to your earlier point - you said the first 5 commits of 7 here are ready for review? I can at least start taking a look in evenings this week...

jcjones avatar Oct 26 '20 22:10 jcjones

Yeah, the first five are basically just refactoring to make it possible to sanely implement an NFC transport. This branch should pass all tests and linters after every single commit and I've tried to make them all self-containing, so the first five should be completely safe to merge already.

Some parts (especially the fifth commit) do make clear that quite a lot of the existing code should probably be cleaned up afterwards, but I wanted to avoid doing basically a complete rewrite without knowing for sure that this is the right approach to use.

The names used could probably really use a second pair of eyes, as I seem to be horrible at naming things. But there's no need to hurry. Why not just read a book in your evenings or something? It's your holiday, after all!

Crote avatar Oct 27 '20 01:10 Crote

For context on my drive-by commentary above, I've been having a look at smart card support in another FIDO2 library, which depends on this library. That other library has errors in it's ISO 7816 implementation as well, which I'm in the process of fixing up. Given the complexity of the standards, I thought I should have a peek at what's happening here as well.

FYI: some tokens don't actually implement ISO 7816 correctly, which can cause your local tests to "unexpectedly pass" but fail with other tokens. The FIDO2 NFC specs aren't very easy to read either, and they gloss over some important details that you'll only find in the ISO 7816-3/-4 specs.

micolous avatar Aug 12 '22 08:08 micolous

Is anyone actively working on this or planning to work on it in the near, foreseeable future?

The branch currently doesn't merge cleanly because the following changes conflict:

  • https://github.com/mozilla/authenticator-rs/commit/47d69204c56d04f17f4918ad4802c4fb6193016f -- Rename and document U2FAPDUHeader and fix Nc=0. (https://github.com/mozilla/authenticator-rs/pull/191)
  • https://github.com/mozilla/authenticator-rs/commit/f9f1655248ae13ce2da985a1debed0cf25bf81bf -- Detangle APDU code from HID implementation

If I merge this branch instead with 660a701b45977cb5d184380061dfc8cec4f696be, the parent of the commit 47d69204c56d04f17f4918ad4802c4fb6193016f which introduced the conflicts, the merge is clean and the code compiles, but the example program fails here:

https://github.com/Crote/authenticator-rs/blob/0de730f112e07908b481f38842b1c43682f6724d/src/nfc.rs#L108-L112

This happens because, as implemented, a single NFCManager can only do one request in its lifetime -- there's no mechanism to queue multiple requests the way U2FManager does -- and the example program tries to reuse it both to make a credential and to authenticate with it.

If I insert manager.cancel() between the manager.register(...) logic and the manager.sign(...) logic, I can verify that the code works to make and authenticate credentials with a real fido card, both via contact smartcard and via NFC. (I can also verify that all automatic tests pass.)

But I think that's the wrong approach -- should do it like U2FManager in manager.rs to allow multiple queued operations, and perhaps even factor the runloop and queue management logic out of U2FManager and NFCManager.

Happy to draft a patch to do this, and happy to do a bit more work to resolve the merge conflict, but if the work is already in progress, don't want to step on anyone's toes.

riastradh avatar Mar 17 '23 20:03 riastradh

I'm not working on this. @msirringhaus are you?

Active development is happening on the ctap2-2021 branch. We're about a month away from being able to cut that over to main. We'll be removing a lot of duplicate / unnecessary code at the same time. So if you're willing to wait, it's going to be a lot easier to work on then.

Feel free to start now if you'd prefer. I can take a closer look at the patch attached to this PR on Monday and let you know what I think needs to be changed.

jschanck avatar Mar 17 '23 23:03 jschanck

I took a closer look at this branch than just by blindly merging and testing, and I took a look at the ctap2-2021 branch too in order to estimate what work would be involved in bringing NFC support into it.

This nfc branch retrofits the NFC/smartcard support into two different sets of StateMachine methods: register/sign, which takes an existing U2FDevice (renamed APDUDevice) and does the u2f_register/sign operations; and _register/_sign (new names for what register/sign do in main and ctap2-2021), which tries to discover devices using an OS-dependent transaction and calls register/sign.

  • NFCManager is an AuthenticatedTransport that uses a single PCSC context to discover cards on the fly and call the StateMachine's register/sign methods.
  • U2FManager is an AuthenticatedTransport that calls the StateMachine's _register/_sign methods to discover HIDs on the fly and call the StateMachine's register/sign methods.

In other words, the NFCManager inverts the discovery vs operation logic from U2FManager: NFCManager discovers cards first, then uses StateMachine just for operation; U2FManager uses StateMachine to do discovery and operation, as in main and ctap2-2021. But the two kinds of manager do the same operation under the hood -- the same CTAP1 command/response protocol -- just on a different underlying protocol transport (PCSC library calls vs OS HID I/O).

This is a little awkward, and the ctap2-2021 branch looks like it will make this approach a little more difficult because there are two different state machines, one for CTAP1 and one for CTAP2, which would both need to be adapted for the retrofit.

It seems to me it would make more sense if either:

  1. Transaction could enumerate both OS HIDs and PCSC cards, although that might require something like making each OS's DeviceBuildParameters be a disjoint union of OS HIDs and PCSC cards (if PCSC is relevant -- not on macOS, for instance, I think) and dispatching the sum type in each OS's Monitor and Device::new; or
  2. StateMachine were parametrized by different types of transactions, say OSTransaction and PCSCTransaction, which would probably require making Transaction/Device/DeviceBuildParameters as used by StateMachine be traits/parametric rather than structs -- requiring mostly mechanical changes to the OS interfaces and a little surgery to the types in StateMachine and src/transport.

It looks like some renovation might still be under way in the organization of src/transport, judging by all the TODO comments, so maybe I should hold off on trying to take any of this up in the ctap2-2021 branch until that renovation has settled further.

riastradh avatar Mar 18 '23 23:03 riastradh

I am not (yet) actively working on this, but it is on my (ever-growing) TODO-list.

This is a little awkward, and the ctap2-2021 branch looks like it will make this approach a little more difficult because there are two different state machines, one for CTAP1 and one for CTAP2, which would both need to be adapted for the retrofit.

That is one of the things that will go away, once we merge over to main. We have currently a lot of code-duplication, because we needed a way to switch back and forth between old and new code in case a critical bug is encountered. Once we are satisfied that the new code works, we'll remove the old completely.

In general, I'd say it will be easiest to 1.) wait for the merge and correspondig code cleanup 2.) abandon this PR completely and just copy&paste some bits from it to a new PR, as many things have changed since this PR was issued.

I would need to experiment a bit with this myself, in order to figure out, how exactly this should be integrated into the existing code.

msirringhaus avatar Mar 20 '23 08:03 msirringhaus