formats
formats copied to clipboard
der: string support deficiencies
Taking this out from #690 , #691 and few practical thoughts from my side:
The following string tags are unsupported:
- [ ] BMPString
- [ ] UniversalString
- [ ] VisibleString
- [ ] GraphicString
Additionally:
- [ ] TeletexString lacks support for
chars >= 0x80
Additionally current API lacks easy to use unified string API (at least for AnyRef). Currently when parsing the string from AnyRef (e.g. in RDN) one has to manually match all the string tags and to use corresponding foo_string() calls. Instead it might be useful to add the is_string() -> bool and to support generic string wrapper around ::printable_string(), ::utf8_string(), ::ia5_string(), etc.
support generic string wrapper around
::printable_string(),::utf8_string(),::ia5_string(), etc.
Yes, I mentioned as much in #692. I definitely want to address this before cutting another release of der as there is currently massive amount of duplication and copy-pasted code among the various string types.
There's an internal StrSlice type it might make sense to expose publicly as a common representation of a string. It's a newtype wrapper for &str with an associated [Length].
With this as a common internal representation, a lot of the duplication between e.g. Ia5StringRef, TeletexStringRef, and VideotexStringRef could be eliminated, possibly via Deref coercions to replace the existing (same-shaped) inherent methods with a common undelrying type, with the core differences between the various types being largely relegated to the constructor.
It would be good to figure all of this out before trying to add any additional string types.
@tarcieri the major issue with the StrSlice that I faced while trying to implement TeletexString in a more correct way (handling byte >= 0x80 as if it was the win-1252), is that it can not be easily updated to support different as_bytes() implementation (or I could not come up with the way to achieve this). So all the strings currently work because either they are ASCII strings or Utf8Strings. For BMPString we'd need UTF-16BE encoding.
@lumag yeah, StrSlice could only work for cases that are subsets of ASCII/UTF-8, but I think it would still make sense to use it for those.
Notably I think Ia5StringRef could be changed to Deref to a StrSlice (replacing the existing inherent methods) without it being a breaking change, particularly as StrSlice isn't currently part of the public API so we can make changes there if necessary.
Being a minor party here, I'm fine with either approach as long as StrSlice doesn't become the public API.
If it were the Deref target for ASCII-like encodings it would have to be
Can it be replaced with the trait to provide the calling interface?
A trait would be an alternative option. You'd still need to retain the inherent methods on Ia5StringRef for now in order to avoid a breaking change, however.
As a second thought, let's maybe rename StrSlice to AbstractString and use Deref as you suggested? In the end, it might be possible to extend it later to suppor custom byte representations.
This crate generally uses the *Ref naming convention for references, leaving *String for the owned types, ala std::string::String.
StrSlice is used all over the place in an AsRef<str> capacity, so it's not going to be extensible to something which doesn't have a straightforward infallible conversion to &str.
AsRef<str> would stay as is. It's the AsRef<[u8]>::as_ref(&self), who
would require it to be changed.
... if different byte encodings are to be supported.
I really think the types which represent encodings which aren't subsets of ASCII/UTF-8 should be distinct from each other.
The CHOICE types that operate over them can map to enums, and you can have a method which returns a Cow<'a, str> which transcodes if necessary.
@tarcieri I have to admit, that this is out of my level of Rust knowledge.
Cow abstracts over the owned and borrowed forms, so the encodings which are subsets of ASCII/UTF-8 can be represented using a simple reference conversion, and UTF-16 transcoded.
Separately I'm curious if applying RFC6818 deprecations could reduce the total number of string types that need to be implemented to support X.509.
There is a usual problem with deprecations. It is typical to have a set of data generated with the previous standards in mind, e.g. older digitally signed and/or encrypted documents. So, I've always assumed that if the data/type/tag was allowed to be present at some point in the past, the software should still be able to accept it.
I opened a proposal for generic string support in #1061.
For BMPString to work with a type which is String internally we'd have to transcode UCS-2(?) to UTF-8 when decoding and transcode back to UCS-2 when encoding.
Or we could store the raw UCS-2, but that wouldn't be able to leverage a generic implementation as easily. I think the transcoding penalty is worth the convenience of being able to have a common generic string type.
Historically I think it was impossible, because all Strings were referenced. Now with the addition of owned structures, it should be possible to transcode UCS-2 <-> UTF-8. In the worst case, we might end up with transcoding Owned implementation and raw-UCS-2 Ref implementation.
I've added BmpString which is the most pressing.
Let's track the others in #1061.