activerecord-sqlserver-adapter icon indicating copy to clipboard operation
activerecord-sqlserver-adapter copied to clipboard

Problem with persisting multiline texts with trailing backslashes

Open neongrau opened this issue 8 years ago • 13 comments

First i must say i'm not sure where exactly the error happens. Maybe it's the adapter or maybe it's ActiveRecord's serializer logic.

I've encountered this error on Rails 4.2 as well as in Rails 5.0.

Having a table with a text/nvarchar(max) column, let's call the column 'data' that's supposed to be serialized.

Migration for replication:

class CreateTestings < ActiveRecord::Migration[5.0]
  def change
    create_table :testings do |t|
      t.text :data
    end
  end
end

Model

class Testing < ApplicationRecord
  serialize :data
end

When having a Hash with a String value that ends in a backslash and at the same time more data after that line in the serialized YAML this will result in an exception making the record pretty much broken.

x = Testing.create(data:{ foo:'foo\\', bar:'bar' })
#  SQL (0.0ms)  BEGIN TRANSACTION
#  SQL (0.0ms)  EXEC sp_executesql N'INSERT INTO [testings] ([data]) OUTPUT INSERTED.[id] #VALUES (@0)', N'@0 nvarchar(max)', @0 = N'---
#:foo: foo\
#:bar: bar
#'  [["data", "---\n:foo: foo\\\n:bar: bar\n"]]
#  SQL (15.7ms)  COMMIT TRANSACTION
#=> #<Testing id: 1, data: {:foo=>"foo\\", :bar=>"bar"}>
x.reload
Psych::SyntaxError: (<unknown>): mapping values are not allowed in this context at line 2 column 14

Looking into the database i see this was stored:

---\n:foo: foo:bar: bar\n

but actually there should be this:

---\n:foo: foo\\\n:bar: bar\n

Further tries

# will not trigger an exception will just lose the backslash
x = Testing.create(data:{ foo:'foo\\' })
# will not trigger an exception will just lose the last backslash, the first one will stay
x = Testing.create(data:{ foo:'foo\\\\' })

Any thoughts?

neongrau avatar Jun 23 '17 11:06 neongrau

Sorry for the late reply! This sounds like a bug in the adapter's quoting to use sp_excutesql. Really surprised this has not come up before too. Let me take a quick look...

metaskills avatar Sep 07 '17 01:09 metaskills

So just playing around with some debugging on the YAML example. That new literal format was new to me.

{foo:'foo', bar:'bar'}
=> {:foo=>"foo", :bar=>"bar"}
YAML.load "---\n:foo: foo\n:bar: bar\n"
=> {:foo=>"foo", :bar=>"bar"}
YAML.dump({foo:'foo', bar:'bar'})
=> "---\n" + ":foo: foo\n" + ":bar: bar\n"

Now playing with reproducing your issue using the scratchpad test we have and a model with a nvarchar(max) column.

SSTestDatatypeMigration.columns_hash['text_col'].sql_type
=> "nvarchar(max)"
o = SSTestDatatypeMigration.create! text_col: "---\n:foo: foo\n:bar: bar\n"
YAML.load(o.reload.text_col)
=> {:foo=>"foo", :bar=>"bar"}
  SQL (1.8ms)  EXEC sp_executesql N'INSERT INTO [sst_datatypes_migration] ([text_col]) OUTPUT INSERTED.[id] VALUES (@0)', N'@0 nvarchar(max)', @0 = N'---
:foo: foo
:bar: bar
'  [["text_col", nil]]

So everything looks good here. We are passing the ActiveRecord tests for serialized data too. So why were are getting different results? It can't be that composite primary keys gem again can it ;)

metaskills avatar Sep 07 '17 01:09 metaskills

Definitely not the composite keys gem, i kicked that for good.

But in your examples that work and would here too i assume you don't have the problematic trailing backslash in a value that's not the last one.

This is what was causing the problem. I had user data where someone (accidentally) appended a backslash.

Pleas try with this change:

o = SSTestDatatypeMigration.create! text_col: "---\n:foo: foo\\\n:bar: bar\n"
YAML.load(o.reload.text_col)

the value of foo is supposed to be foo\

Real world scenario could be having a Windows-style directory like

{ folder: 'c:\\data\\foo\\', file: 'example.yml' }

neongrau avatar Sep 07 '17 06:09 neongrau

Hmm... this is ODD. Can you tell me how to insert SQL like this more low level? I did a quick raw test and saw these results. Does that look odd? I wonder what SMS does?

sql = "INSERT INTO [sst_datatypes_migration] ([text_col]) VALUES (N'---\n:foo: foo\\\n:bar: bar\n')"
SSTestDatatypeMigration.delete_all
ActiveRecord::Base.connection.execute(sql)
SSTestDatatypeMigration.first.text_col # => "---\n:foo: foo:bar: bar\n"
INSERT INTO [sst_datatypes_migration] ([text_col]) VALUES (N'---
:foo: foo\
:bar: bar
')

metaskills avatar Oct 17 '17 02:10 metaskills

This is really weird.

INSERT INTO [testings] ([data]) OUTPUT INSERTED.[id] VALUES (N'---
:folder: c:\data\foo\
:file: example.yml
')

I'm seeing exactly the same problem when executing this sample in SMS and also in the macOS client SQLPro Studio which is using FreeTDS afaik.

in both SMS and SQLPro Studio i can manually edit an existing record by copy and pasting the YAML string which result in the following UPDATE statement which i grabbed from SQL Server Profiler

BEGIN

SET QUOTED_IDENTIFIER OFF
UPDATE
	dbo.testings
SET
	data = '---
:folder: c:\data\foo\
:file: example.yml
'
WHERE
	id = 2

END

The formatting of the string looks absolutely identical, yet the result is correct. We then presumed it's the line SET QUOTED_IDENTIFIER OFF that has something to do with it. But an INSERT with that line in front still does not work.

What works is manually adding a blank space after the trailing backslash. Or wrapping the path into quotation marks thereby modifying the YAML formatting.

It's just as if this YAML is plain incompatible to be persisted in SQL.

neongrau avatar Oct 18 '17 15:10 neongrau

I guess i've found the root of this problem.

This is a special thing that SQL Server is doing. A backslash at an end of a line is treated as
\ breaks a long string constant into two or more lines for readability.

https://docs.microsoft.com/en-us/sql/t-sql/language-elements/sql-server-utilities-statements-backslash

This means it's not a YAML specific issue. This would happen with any multi-line string that has a trailing backslash in a line.

INSERT INTO [testings] ([data]) 
VALUES (N'There is a file in the folder c:\data\random-texts\
with Lorem Ipsum inside.
This note is supposed to be 3 lines long.
')

This will result in a 2 liner instead of the 3 lines when done from SMS.

And same when attempted in Ruby

note = "There is a file in the folder c:\\data\\random-texts\\\nwith Lorem Ipsum inside.\nThis note is supposed to be 3 lines long."

irb(main):013:0> puts note
There is a file in the folder c:\data\random-texts\
with Lorem Ipsum inside.
This note is supposed to be 3 lines long.

x = Testing.create(data: note)
  SQL (0.0ms)  BEGIN TRANSACTION
  SQL (15.3ms)  EXEC sp_executesql N'INSERT INTO [testings] ([data]) OUTPUT INSERTED.[id] VALUES (@0)', N'@0 nvarchar(max)', @0 = N'--- |-
  There is a file in the folder c:\data\random-texts\
  with Lorem Ipsum inside.
  This note is supposed to be 3 lines long.
'  [["data", nil]]
  SQL (0.0ms)  COMMIT TRANSACTION

irb(main):016:0> puts x.data
There is a file in the folder c:\data\random-texts\
with Lorem Ipsum inside.
This note is supposed to be 3 lines long.

x.reload

#  Testing Load (0.0ms)  EXEC sp_executesql N'SELECT  [testings].* FROM [testings] WHERE [testings].[id] = @0  ORDER BY [testings].[id] ASC OFFSET 0 ROWS FETCH NEXT @1 ROWS ONLY', N'@0 int, @1 int', @0 = 6, @1 = 1  [["id", nil], ["LIMIT", nil]]
irb(main):018:0> puts x.data
There is a file in the folder c:\data\random-texts  with Lorem Ipsum inside.
This note is supposed to be 3 lines long.

neongrau avatar Oct 19 '17 06:10 neongrau

sounds like the same to me: https://spin.atomicobject.com/2008/03/18/trailing-backslash-problem-and-fix-for-rails-and-sqlserver-2005/

neongrau avatar Oct 19 '17 07:10 neongrau

Damn! Incredible sleuthing. I'm guessing there is nothing the adapter needs to do with this. Perhaps you can use JSON instead or a custom YAML serializer for this column/attribute. I did a quick search it seems it is possible to enforce quoting if needed. Maybe there is a simpler way. https://github.com/ruby/psych/issues/322

metaskills avatar Oct 21 '17 01:10 metaskills

Eh? No it actually has nothing to do with YAML. It‘s an adapter thing and so far the only solution i can rhink of is to Regex parse any strings to handle the special way SQL Server treats backslashes at EOL.

Like the old Monkey patch from 2008 does.

neongrau avatar Oct 21 '17 02:10 neongrau

I'm not so sure we should. Ideally we should act just like SMS does and what you showed me is that this is the case right? Supporting patches for any string, YAML or otherwise, looks like a game that we cant keep up with. So the hacks should be contextual on the end user. IE, if you want to support YAML with possible strings that end in , then you will have to ensure that. Make sense?

metaskills avatar Oct 21 '17 16:10 metaskills

Again: this has nothing to do with YAML. Any textarea can suffer from thus. Just enter a text with a lne that ends with a backslash and a following linebreak will be removed alongside the backslash.

It‘s a braindead feature for creating more readable SQL that affects user input in programmatically generated SQL.

neongrau avatar Oct 21 '17 17:10 neongrau

In any text to persist losing that backslash and the linebreak sure is just 2 bytes lost. In YAML you will notice quick because that breaks all following data.

neongrau avatar Oct 21 '17 17:10 neongrau

Again: this has nothing to do with YAML

I KNOW. Seriously :) I knew that all along. This is the point I am making tho. I only used YAML as an example of what an end user would have to deal with.

metaskills avatar Oct 21 '17 18:10 metaskills