Problem with persisting multiline texts with trailing backslashes
First i must say i'm not sure where exactly the error happens. Maybe it's the adapter or maybe it's ActiveRecord's serializer logic.
I've encountered this error on Rails 4.2 as well as in Rails 5.0.
Having a table with a text/nvarchar(max) column, let's call the column 'data' that's supposed to be serialized.
Migration for replication:
class CreateTestings < ActiveRecord::Migration[5.0]
def change
create_table :testings do |t|
t.text :data
end
end
end
Model
class Testing < ApplicationRecord
serialize :data
end
When having a Hash with a String value that ends in a backslash and at the same time more data after that line in the serialized YAML this will result in an exception making the record pretty much broken.
x = Testing.create(data:{ foo:'foo\\', bar:'bar' })
# SQL (0.0ms) BEGIN TRANSACTION
# SQL (0.0ms) EXEC sp_executesql N'INSERT INTO [testings] ([data]) OUTPUT INSERTED.[id] #VALUES (@0)', N'@0 nvarchar(max)', @0 = N'---
#:foo: foo\
#:bar: bar
#' [["data", "---\n:foo: foo\\\n:bar: bar\n"]]
# SQL (15.7ms) COMMIT TRANSACTION
#=> #<Testing id: 1, data: {:foo=>"foo\\", :bar=>"bar"}>
x.reload
Psych::SyntaxError: (<unknown>): mapping values are not allowed in this context at line 2 column 14
Looking into the database i see this was stored:
---\n:foo: foo:bar: bar\n
but actually there should be this:
---\n:foo: foo\\\n:bar: bar\n
Further tries
# will not trigger an exception will just lose the backslash
x = Testing.create(data:{ foo:'foo\\' })
# will not trigger an exception will just lose the last backslash, the first one will stay
x = Testing.create(data:{ foo:'foo\\\\' })
Any thoughts?
Sorry for the late reply! This sounds like a bug in the adapter's quoting to use sp_excutesql. Really surprised this has not come up before too. Let me take a quick look...
So just playing around with some debugging on the YAML example. That new literal format was new to me.
{foo:'foo', bar:'bar'}
=> {:foo=>"foo", :bar=>"bar"}
YAML.load "---\n:foo: foo\n:bar: bar\n"
=> {:foo=>"foo", :bar=>"bar"}
YAML.dump({foo:'foo', bar:'bar'})
=> "---\n" + ":foo: foo\n" + ":bar: bar\n"
Now playing with reproducing your issue using the scratchpad test we have and a model with a nvarchar(max) column.
SSTestDatatypeMigration.columns_hash['text_col'].sql_type
=> "nvarchar(max)"
o = SSTestDatatypeMigration.create! text_col: "---\n:foo: foo\n:bar: bar\n"
YAML.load(o.reload.text_col)
=> {:foo=>"foo", :bar=>"bar"}
SQL (1.8ms) EXEC sp_executesql N'INSERT INTO [sst_datatypes_migration] ([text_col]) OUTPUT INSERTED.[id] VALUES (@0)', N'@0 nvarchar(max)', @0 = N'---
:foo: foo
:bar: bar
' [["text_col", nil]]
So everything looks good here. We are passing the ActiveRecord tests for serialized data too. So why were are getting different results? It can't be that composite primary keys gem again can it ;)
Definitely not the composite keys gem, i kicked that for good.
But in your examples that work and would here too i assume you don't have the problematic trailing backslash in a value that's not the last one.
This is what was causing the problem. I had user data where someone (accidentally) appended a backslash.
Pleas try with this change:
o = SSTestDatatypeMigration.create! text_col: "---\n:foo: foo\\\n:bar: bar\n"
YAML.load(o.reload.text_col)
the value of foo is supposed to be foo\
Real world scenario could be having a Windows-style directory like
{ folder: 'c:\\data\\foo\\', file: 'example.yml' }
Hmm... this is ODD. Can you tell me how to insert SQL like this more low level? I did a quick raw test and saw these results. Does that look odd? I wonder what SMS does?
sql = "INSERT INTO [sst_datatypes_migration] ([text_col]) VALUES (N'---\n:foo: foo\\\n:bar: bar\n')"
SSTestDatatypeMigration.delete_all
ActiveRecord::Base.connection.execute(sql)
SSTestDatatypeMigration.first.text_col # => "---\n:foo: foo:bar: bar\n"
INSERT INTO [sst_datatypes_migration] ([text_col]) VALUES (N'---
:foo: foo\
:bar: bar
')
This is really weird.
INSERT INTO [testings] ([data]) OUTPUT INSERTED.[id] VALUES (N'---
:folder: c:\data\foo\
:file: example.yml
')
I'm seeing exactly the same problem when executing this sample in SMS and also in the macOS client SQLPro Studio which is using FreeTDS afaik.
in both SMS and SQLPro Studio i can manually edit an existing record by copy and pasting the YAML string which result in the following UPDATE statement which i grabbed from SQL Server Profiler
BEGIN
SET QUOTED_IDENTIFIER OFF
UPDATE
dbo.testings
SET
data = '---
:folder: c:\data\foo\
:file: example.yml
'
WHERE
id = 2
END
The formatting of the string looks absolutely identical, yet the result is correct.
We then presumed it's the line SET QUOTED_IDENTIFIER OFF that has something to do with it. But an INSERT with that line in front still does not work.
What works is manually adding a blank space after the trailing backslash. Or wrapping the path into quotation marks thereby modifying the YAML formatting.
It's just as if this YAML is plain incompatible to be persisted in SQL.
I guess i've found the root of this problem.
This is a special thing that SQL Server is doing.
A backslash at an end of a line is treated as
\ breaks a long string constant into two or more lines for readability.
https://docs.microsoft.com/en-us/sql/t-sql/language-elements/sql-server-utilities-statements-backslash
This means it's not a YAML specific issue. This would happen with any multi-line string that has a trailing backslash in a line.
INSERT INTO [testings] ([data])
VALUES (N'There is a file in the folder c:\data\random-texts\
with Lorem Ipsum inside.
This note is supposed to be 3 lines long.
')
This will result in a 2 liner instead of the 3 lines when done from SMS.
And same when attempted in Ruby
note = "There is a file in the folder c:\\data\\random-texts\\\nwith Lorem Ipsum inside.\nThis note is supposed to be 3 lines long."
irb(main):013:0> puts note
There is a file in the folder c:\data\random-texts\
with Lorem Ipsum inside.
This note is supposed to be 3 lines long.
x = Testing.create(data: note)
SQL (0.0ms) BEGIN TRANSACTION
SQL (15.3ms) EXEC sp_executesql N'INSERT INTO [testings] ([data]) OUTPUT INSERTED.[id] VALUES (@0)', N'@0 nvarchar(max)', @0 = N'--- |-
There is a file in the folder c:\data\random-texts\
with Lorem Ipsum inside.
This note is supposed to be 3 lines long.
' [["data", nil]]
SQL (0.0ms) COMMIT TRANSACTION
irb(main):016:0> puts x.data
There is a file in the folder c:\data\random-texts\
with Lorem Ipsum inside.
This note is supposed to be 3 lines long.
x.reload
# Testing Load (0.0ms) EXEC sp_executesql N'SELECT [testings].* FROM [testings] WHERE [testings].[id] = @0 ORDER BY [testings].[id] ASC OFFSET 0 ROWS FETCH NEXT @1 ROWS ONLY', N'@0 int, @1 int', @0 = 6, @1 = 1 [["id", nil], ["LIMIT", nil]]
irb(main):018:0> puts x.data
There is a file in the folder c:\data\random-texts with Lorem Ipsum inside.
This note is supposed to be 3 lines long.
sounds like the same to me: https://spin.atomicobject.com/2008/03/18/trailing-backslash-problem-and-fix-for-rails-and-sqlserver-2005/
Damn! Incredible sleuthing. I'm guessing there is nothing the adapter needs to do with this. Perhaps you can use JSON instead or a custom YAML serializer for this column/attribute. I did a quick search it seems it is possible to enforce quoting if needed. Maybe there is a simpler way. https://github.com/ruby/psych/issues/322
Eh? No it actually has nothing to do with YAML. It‘s an adapter thing and so far the only solution i can rhink of is to Regex parse any strings to handle the special way SQL Server treats backslashes at EOL.
Like the old Monkey patch from 2008 does.
I'm not so sure we should. Ideally we should act just like SMS does and what you showed me is that this is the case right? Supporting patches for any string, YAML or otherwise, looks like a game that we cant keep up with. So the hacks should be contextual on the end user. IE, if you want to support YAML with possible strings that end in , then you will have to ensure that. Make sense?
Again: this has nothing to do with YAML. Any textarea can suffer from thus. Just enter a text with a lne that ends with a backslash and a following linebreak will be removed alongside the backslash.
It‘s a braindead feature for creating more readable SQL that affects user input in programmatically generated SQL.
In any text to persist losing that backslash and the linebreak sure is just 2 bytes lost. In YAML you will notice quick because that breaks all following data.
Again: this has nothing to do with YAML
I KNOW. Seriously :) I knew that all along. This is the point I am making tho. I only used YAML as an example of what an end user would have to deal with.