pydantic-core
pydantic-core copied to clipboard
`EmailValidator` type
Using mail_parser I guess.
Rationale:
- remove the
email-validatoroptional dependency from pydantic, and its descendant dependencies -dnspythonandidna - should be faster
- the
NameEmail"My Name <[email protected]>"parsing can be improved, see https://github.com/pydantic/pydantic/issues/3173 - rust is much better for this kind of parsing than python (actually looks likemail_parsersupports this out of the box)
We just need to make sure that adding mail_parser dependency doesn't significantly increase binary size - any one working on this. I encourage you to do a very simple POC implementation, create a PR and ask us to add the "Full Build" lable to see how much binary size has increased before completing the PR, in case we have to abandon the idea due to this.
To be clear we need two validators EmailValidator and NameEmailValidator which will be represented by EmailStr and NameEmail in pydantic.
What do you think of using or having something like: https://github.com/Keats/validator for pydantic? It includes email and credit card and some others
@samuelcolvin I think we are trying to solve two issues into one.
I think it is better to do it as a combination of
validation = validate(quotedform)
quotedform = validate(quotedpart) + SPACE + validate(email)
Like the Email Validator validates an email. The quoted form is not an email.
We can add another QuotedEmailValidator
What do you think of using or having something like: https://github.com/Keats/validator for pydantic? It includes email and credit card and some others
Not really required, we have our own validator system which is designed to be used from python code - remember we don't know the exact schema for validation until runtime.
@Abdur-rahmaanJ I agree that NameEmailValidator should use the same logic internally to validate the actual email address.
I'll be trying to create something basic, will update
I took a look at smtp-proto - and it seems that we may use it for addr-spec,
But AFAICT, the whole validation code is around 100 LOC, and it requires a \n at the end of the string.
So if this is all the code that is needed (I expected it to be more complex) it may be worth implementing this on our own
[package]
name = "mail-testing"
version = "0.1.0"
edition = "2021"
[dependencies]
smtp-proto = "0.1.1"
use smtp_proto::request::parser::Rfc5321Parser;
fn main() {
let mut input = "[email protected]\n".as_bytes().into_iter();
let mut parser = Rfc5321Parser::new(&mut input);
println!("{:?}", parser.address());
}
If it's only 100 LOC I would be in favor of just vendoring it, which I take to be your suggestion. If it appears to be thorough/correct, I don't expect it will be much of a maintenance burden on ourselves, it's not like the spec is changing. I imagine the small amount of effort it would take to integrate a fix if a bug was found would probably be worth the elimination of the dependency. Open to disagreement.
Just getting stuck into rust, curious why something like https://docs.rs/email_address/latest/email_address/ wouldnt suit?
I think it looks great. Not many stars, but that doesn't matter much necessarily.
The first two things to check are:
- Good much do the binaries increase in size.
- How many of
email-validator's (the python package) tests pass with this library
If it's good on those two, it's golden.
Moving into here as more relevant for our discussion...
use smtp_proto::request::parser::Rfc5321Parser;
fn main() {
let email: &str = "Art Vandelay <[email protected]>\n";
let email2: &str = "[email protected]\n";
let addr = Rfc5321Parser::new(&mut email.as_bytes().iter()).address();
let addr2 = Rfc5321Parser::new(&mut email2.as_bytes().iter()).address();
println!("{:?}", addr);
println!("{:?}", addr2);
}
Outputs
Ok(None)
Ok(Some("[email protected]"))
Looks like we might need a blended approach as neither lib has exactly what we want, assuming we want to:
- validate the email
- Extract sections from the email
- Display Name:
Art Vandelay - email:
[email protected] - local_part:
art - domain:
vandelay.com
- Display Name:
Added a PR to the email_address repo, see if its accepted, then we could have the above working https://github.com/johnstonskj/rust-email_address/pull/15
Using the above...
use email_address::EmailAddress;
fn main() {
let email = "[email protected]";
let addr = EmailAddress::from_str(email).unwrap();
println!("Existing example");
println!("-----------------");
println!("{}", addr);
println!("");
let email2 = "Art Vandelay <[email protected]>";
let addr2 = EmailAddress::from_str(email2).unwrap();
println!("New example");
println!("-----------------");
println!("{}", addr2);
println!("{}", addr2.email());
println!("{} <{}@{}>", addr2.display_part(), addr2.local_part(), addr2.domain());
}
Outputs
Existing example
-----------------
[email protected]
New example
-----------------
Art Vandelay <[email protected]>
[email protected]
Per https://github.com/pydantic/pydantic-core/pull/413#issuecomment-1450065258 it looks like you may be sticking with the Python library for email validation?
Indeed, given the above, seems like we'll probably stick with email-validator. I'll leave this open as a potential performance enhancement, though it's certainly not high priority