graphql-ruby
graphql-ruby copied to clipboard
Escape sequences in string arguments - unicode and others
Describe the bug handling unicode has some bugs and weird behaviors:
- parsing unicode with double escape chars. this makes it impossible to pass strings with the pattern "\uXXXX"
- any escape sequence, not only unicodes: string within triple quotes are still parsed the same as single quotes (possible ruby related?)
Versions
graphql version: 2.0.19
rails (or other framework): 6.0.4.7
other applicable versions (graphql-batch, etc): none
GraphQL schema
in both the examples below i used the following simple resolver. note that the result is the rendered result to the client, so there are escape characters added. i also printed the string passed to the resolver char by char to make sure i'm not making mistakes:
class Resolvers::GetString < GraphQL::Schema::Resolver
argument :string, String, "the name to return"
type String, null: false
def resolve(**arguments)
puts "***"
puts "string.length: #{arguments[:string].length}"
puts "string value: \"#{arguments[:string]}\""
# printing the chars one by one
i = 0
arguments[:string].each_char do |c|
puts "string[#{i}] = \"#{c}\""
i += 1
end
puts "***"
arguments[:string]
end
end
class Types::QueryType < Types::BaseObject
field :get_string, resolver: Resolvers::GetString, description: "Just return the given string"
end
class ApplicationSchema < GraphQL::Schema
query Types::QueryType
end
GraphQL query
Example GraphQL query and response (if query execution is involved)
example for bug 1:
query bug1 {
example1: get_string(string: "\u0064") # correct: should return "d"
example2: get_string(string: "\\u0064") # incorrect: should return "\\u0064"
# example3: get_string(string: "\u006") # correct: validation error
example4: get_string(string: "\\u006") # correct: should return "\\u006"
}
the log:
***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***
***
string.length: 5
string value: "\u006"
string[0] = "\"
string[1] = "u"
string[2] = "0"
string[3] = "0"
string[4] = "6"
***
the result:
{
"data": {
"example1": "d",
"example2": "d",
"example4": "\\u006"
}
}
example for bug 2
query bug2 {
# example1: get_string(string: """\a""") # incorrect (fails): should be "\\a"
# example2: get_string(string: """\u006""") # incorrect (fails): should be "\\u006"
example3: get_string(string: """\n""") # incorrect: should be "\\n"
example4: get_string(string: """\u0064""") # incorrect: should be "\\u0064"
}
the log:
***
string.length: 1
string value: "
"
string[0] = "
"
***
***
string.length: 1
string value: "d"
string[0] = "d"
***
the result:
{
"data": {
"example3": "\n",
"example4": "d"
}
}
Steps to reproduce
Steps to reproduce the behavior
Expected behavior
parsing string arguments:
"\\u0064" => "\u0064""""\u0064""" => "\u0064"
Actual behavior
parsing string arguments:
"\\u0064" => "n""""\u0064""" => "n"
Click to view exception backtrace
n/aAdditional context
I saw an old version changing the behavior of \\u back and forth, stating it was reverted because of failing other parsers test suite:
https://github.com/rmosolgo/graphql-ruby/pull/372
it mentions a test suite in graphql-js in this url (it moved, this is the updated link)
note: in the tests, the string that are written are parsed by the language parser (js). so, for example, the string '\u{1234}' is actually passed to the parser as ሴ, while the string \\u{1234} is passed as the characters \, u, {, etc.
the tests there behave as i the expected behavior i described. examples:
- in this test the parser is given the characters:
"unicode \u1234\u5678\u90AB\uCDEF", and it returns the characters:unicode ሴ噸邫췯
expect(lexOne('"unicode \\u1234\\u5678\\u90AB\\uCDEF"')).to.contain({
kind: TokenKind.STRING,
start: 0,
end: 34,
value: 'unicode \u1234\u5678\u90AB\uCDEF',
});
this works the same as ruby.
- in this test the parser is given the characters:
"escaped \n\r\b\t\f"and returnsescapedfollowed by newline, carriage return, etc.
expect(lexOne('"escaped \\n\\r\\b\\t\\f"')).to.contain({
kind: TokenKind.STRING,
start: 0,
end: 20,
value: 'escaped \n\r\b\t\f',
});
this works the same as ruby.
- in this test the parser is given the characters:
"""unescaped \n\r\b\t\f\u1234"""and returns the characters:unescaped \n\r\b\t\f\u1234
expect(lexOne('"""unescaped \\n\\r\\b\\t\\f\\u1234"""')).to.contain({
kind: TokenKind.BLOCK_STRING,
start: 0,
end: 32,
line: 1,
column: 1,
value: 'unescaped \\n\\r\\b\\t\\f\\u1234',
});
in ruby the chars are converted to special characters, same as 2 - which is a bug.
- to check the escape sequence of unicodes, i cloned the repo and added another local test.
in the test i passed the characters:
"unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE"and the result is the characters:unicode \u{1234}\u{5678}\u{90AB}\u{CDE. the unicode is not parsed, and the escape of the\before theuworks.
expect(
lexOne('"escaped unicode \\\\u{1234}\\\\u{5678}\\\\u{90AB}\\\\u{CDE"'),
).to.contain({
kind: TokenKind.STRING,
start: 0,
end: 52,
value: 'escaped unicode \\u{1234}\\u{5678}\\u{90AB}\\u{CDE',
});
in ruby the first 3 unicodes are parsed, and the last one is not. the escape of the \ before the u does not work.
Hey! Thanks for the detailed writeup. Yes, as you noticed, there have been many attempts to match the spec on this 😅
I've created a replication script for this bug:
Replicate unicode escape bugs
require "bundler/inline"
gemfile do
gem "graphql", "2.1.7"
end
class MySchema < GraphQL::Schema
class QueryType < GraphQL::Schema::Object
field :get_string, String do
argument :string, String
end
def get_string(string:)
puts "***"
puts "string.length: #{string.length}"
puts "string value: \"#{string}\""
string.each_char.each_with_index do |c, i|
puts "string[#{i}] = \"#{c}\""
end
puts "***"
string
end
end
query(QueryType)
end
query_str = File.read("./example1.graphql")
puts query_str
pp MySchema.execute(query_str).to_h
query_str = File.read("./example2.graphql")
puts query_str
pp MySchema.execute(query_str).to_h
# example1.graphql
{
example1: getString(string: "\u0064") # correct: should return "d"
example2: getString(string: "\\u0064") # incorrect: should return "\\u0064"
# example3: getString(string: "\u006") # correct: validation error
example4: getString(string: "\\u006") # correct: should return "\\u006"
}
# example2.graphql
query bug2 {
# example1: getString(string: """\a""") # incorrect (fails): should be "\\a"
# example2: getString(string: """\u006""") # incorrect (fails): should be "\\u006"
example3: getString(string: """\n""") # incorrect: should be "\\n"
example4: getString(string: """\u0064""") # incorrect: should be "\\u0064"
}
It seems like it boils down to two things:
- In block strings, GraphQL-Ruby is interpreting
\...as escapes, but it shouldn't. - In single-quoted strings, GraphQL-Ruby is interpreting
\\ujust like\u, but it shouldn't -- it should be a literal\uinstead.
I've got an open PR to rewrite the lexer (#4718) and I'll dive in on that when I'm done there!
I've got a fix in the works over at #4824. The most interesting part of the bug was basically that these two .gsub! calls resulted in a double-replacing of escaped characters:
https://github.com/rmosolgo/graphql-ruby/blob/18f3deda9944febcbd97a84a4943362f68c8a373/lib/graphql/language/lexer.rb#L302-L303
They had to be merged into a single gsub! call.