nom
nom copied to clipboard
Is there a good way?
enum Tokens<'a>{
Words(&'a str),
Spaces(usize),
Return,
NewLine
}
let input = "\"H\\u{65}llo \\u{20} rust\\n"\";
I want this result:
Vec[
Tokens::Words("Hello"),
Tokens::Spaces(3),
Tokens::Words("rust"),
NewLine
]
Is this feasible?Here is the simple code:
pub fn parse_token(input: &str) -> IResult<&str, Vec<Tokens>>
{
many1(get_token)(input)
}
fn get_token(input: &str) -> IResult<&str, Tokens>
{
alt((
//only return Tokens::Words("H")
map(alpha1, Tokens::Words),
//here is only return a char, It doesn't work well `&str`
map(parse_escaped_char, Tokens::CJKString),
))(input)
}
pub fn parse_escaped_char<'a, E>(input: &'a str) -> IResult<&'a str, char, E>
where
E: ParseError<&'a str> + FromExternalError<&'a str, std::num::ParseIntError>,
{
preceded(
char('\\'),
alt((
parse_unicode,
value('\n', char('n')),
)),
)(input)
}
fn parse_unicode<'a, E>(input: &'a str) -> IResult<&'a str, char, E>
where
E: ParseError<&'a str> + FromExternalError<&'a str, std::num::ParseIntError>,
{
let parse_hex = take_while_m_n(1, 6, |c: char| c.is_ascii_hexdigit());
let parse_delimited_hex = preceded(
char('u'),
delimited(char('{'), parse_hex, char('}')),
);
let parse_u32 = map_res(parse_delimited_hex, move |hex| u32::from_str_radix(hex, 16));
map_opt(parse_u32, |value| std::char::from_u32(value))(input)
}
I'll close it as soon as possible, thx!!!
I think the main problem is that your Tokens::Words
contains a &str
, which means it references a direct slice of the input. That's not what you want though, you want to apply transformations to the input (unescaping unicode escapes), so you'll have to copy the data into a String
.
@Xiretza
Even if I don't use it & str
, the parser returns char. When the input is escape Unicode, it cannot become a continuous string;
when let input = "\"H\\u{65}llo \\u{20} rust\\n"\";
I want get:
Vec[
Tokens::Words(String::from("Hello")),
Tokens::Spaces(3),
Tokens::Words(String::from("rust")),
NewLine
]
but not:
Vec[
Tokens::Words(String::from("H")),
Tokens::Words(String::from("e")),
Tokens::Words(String::from("llo")),
Tokens::Spaces(1),
Tokens::Spaces(1),
Tokens::Spaces(1),
Tokens::Words(String::from("rust")),
NewLine
]
You can do post-parsing transformations on it, for instance