mssql_ecto icon indicating copy to clipboard operation
mssql_ecto copied to clipboard

Text type restricted to maximum length of 4000

Open toddharding opened this issue 7 years ago • 13 comments

Expected Behavior

Text types defined in a migration should be able to store strings of a length greater than 4000.

Current Behavior

When data of greater than length 4000 is inserted the statement is terminated.

Possible Solution

Perhaps a look at the erlang ODBC driver?

Steps to Reproduce (for bugs)

test "insert lots" do
    string = 1..1200 |> Enum.reduce("", fn x, acc -> acc <> to_string(x) end)
    # :text is of unrestricted size type
    post = TestRepo.insert!(%Post{text: string})
    assert post.text == string
end

Your Environment

  • MssqlEcto version: 0.1
  • Ecto version: 2.1
  • Elixir version: 1.4
  • Erlang version: 19
  • Microsoft SQL Server version: 2016
  • Operating System and version: Ubuntu 16.04

toddharding avatar Mar 23 '17 11:03 toddharding

This is definitely a limitation of / bug in the erlang ODBC driver. We tried to support unsized character fields but the erlang ODBC driver mangles the return from them, so we were forced to restrict size to the maximum sized type, wvarchar(4000).

jbachhardie avatar Apr 07 '17 11:04 jbachhardie

This should be the relevant issue for erlang-odbc. https://bugs.erlang.org/browse/ERL-132

Is there a workaround for retrieving an nvarchar(max) RTF string or other technical solution? This is currently a showstopper for me. Thanks!

alexandercarls avatar Aug 01 '17 19:08 alexandercarls

If the data is reliably under 4000 bytes a CAST(column AS nvarchar(4000)) on the select should work. If it's not I'm afraid you're out of luck until that issue in Erlang is resolved. I don't think the driver has any way of extracting large data that's working since the bug affects all variable length fields ☹️

We don't have any immediate plans to patch Erlang but if you'd like to give it a try I believe @toddharding from our team had given it a shot and identified at least the broad strokes of what the problem is.

jbachhardie avatar Aug 02 '17 08:08 jbachhardie

Thank you for the quick reply. Unfortunately that doesn't work for me. Those RTF strings are huge!

alexandercarls avatar Aug 02 '17 15:08 alexandercarls

The odbc application bug was fixed in a fork https://github.com/arcusfelis/eodbc odbc module became eodbc, and application name was also changed.

So, you can try to run with it. From minuses: eodbc requires unixodbc to compile C code part. While odbc is shipped with Erlang/OTP already compiled.

arcusfelis avatar Jun 01 '18 23:06 arcusfelis

Nice work! We'll upgrade and hopefully the unixodbc dependency isn't too much trouble for people.

jbachhardie avatar Jun 02 '18 10:06 jbachhardie

Has this been considered as a possible solution?

http://erlang-odbc-tips.colefichter.ca/

See "Support for NVARCHAR(MAX)" section.

cpursley avatar Sep 18 '18 21:09 cpursley

it uses standard odbc application, which has that length bug.

On Tue, 18 Sep 2018 at 22:14, Cháse Pursłey [email protected] wrote:

Has this been considered as a possible solution?

http://erlang-odbc-tips.colefichter.ca/

See "Support for NVARCHAR(MAX)" section.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/findmypast-oss/mssql_ecto/issues/2#issuecomment-422557552, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnDNOI7AuRvmN4gI5O2Q6vH1t6OpKMTks5ucWIegaJpZM4Mmltp .

-- С уважением, Уваров Михаил. Best regards, Uvarov Michael

arcusfelis avatar Sep 19 '18 09:09 arcusfelis

Thanks @arcusfelis

How would I go about installing and using your eodbc fork (debian system)?

And I assume I'd have to fork mssql_ecto to use eodbc instead of erlang-odbc?

cpursley avatar Sep 19 '18 13:09 cpursley

  • You would need to install unixodbc build deps (basically, what you need to install to build Erlang with odbc).

  • Add eodbc as a dep into rebar.config.

  • Use eodbc module instead of odbc.

arcusfelis avatar Sep 20 '18 21:09 arcusfelis

Thanks @arcusfelis

I'm still not clear how I could get eodbc up and running with mssql_ecto / mssqlex.

I've forked mssqlex and replaced :odbc with :eodbc where (I beleive) appropriate.

Additionally, I'm installing eodbc in my Pheonix app as follows:

defp deps do
    [
      {:phoenix, "~> 1.3.4"},
      {:phoenix_pubsub, "~> 1.1.0"},
      {:phoenix_ecto, "~> 3.4.0"},
      {:mssql_ecto, git: "git://github.com/cpursley/mssql_ecto.git", branch: "c/eodbc", override: true},
      ...

Any other suggestions?

cpursley avatar Sep 24 '18 21:09 cpursley

Ok, I was able to get eodbc loaded by installing it via mix and swapping "odbc" for "eodbc" where appropriate in the mssqlex library. That approach works and it's pretty handy that you can install erlang rebar libraries easily via hex.

However, there are some esoteric errors coming up when running queries where the column type is either varchar(max) or nvarchar(max).

In iex:

{:ok, conn} = Mssqlex.ODBC.init(conn_str)

This works for nvarchar(max). However, the columns are returned as binary.

> :eodbc.sql_query(conn, 'SELECT * FROM dbo.tblSomeTable AS st WHERE st.ID = 1')
=> {:selected,
	 ['ID', 'SomeColumn'],
	 [[1, <<72, 0, 52, 0, 115, 0, 73, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 50, 0, 49, 0, 89, 0, 87, ...>>]
      ]}

This does not work when varchar(max):

> :eodbc.sql_query(conn, 'SELECT * FROM dbo.tblSomeTable AS st WHERE st.ID = 1')
=> [error] GenServer #PID<0.472.0> terminating
** (stop) {:port_exit, :killed}
Last message: {#Port<0.102>, {:exit_status, 139}}
State: {:state, #Port<0.102>, {#PID<0.389.0>, #Reference<0.425679474.514588673.206201>}, #PID<0.389.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.100>, #Port<0.101>], #Port<0.103>, #Port<0.104>}

And via Ecto/Mssqlex, neither varchar(max) nor nvarchar(max) work.

Error for nvarchar(max):

> SomeTable |> Repo.get(1)
=> [error] [79, 68, 66, 67, 58, 32, 114, 101, 99, 101, 105, 118, 101, 100, 32, 117, 110, 101, 120, 112, 101, 99, 116, 101, 100, 32, 105, 110, 102, 111, 58, 32, [123, ['tcp_closed', 44, '#Port<0.62>'], 125], 10]
[error] GenServer #PID<0.378.0> terminating
** (stop) {:port_exit, :could_not_bind_data_buffers}
Last message: {#Port<0.58>, {:exit_status, 22}}
State: {:state, #Port<0.58>, {#PID<0.359.0>, #Reference<0.1246316755.1851523073.245020>}, #PID<0.359.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.54>, #Port<0.55>], #Port<0.61>, #Port<0.62>}
  • The :could_not_bind_data_buffers error appears to be similar to this issue: https://github.com/yandex/clickhouse-odbc/issues/27
  • The corresponding exit code in eobcd: https://github.com/arcusfelis/eodbc/blob/748d613846731c3d776c8f22600d8b6d89c26435/src/eodbc_internal.hrl#L161

Error for varchar(max):

SomeTable |> Repo.get(1)

[error] [79, 68, 66, 67, 58, 32, 114, 101, 99, 101, 105, 118, 101, 100, 32, 117, 110, 101, 120, 112, 101, 99, 116, 101, 100, 32, 105, 110, 102, 111, 58, 32, [123, ['tcp_closed', 44, '#Port<0.57>'], 125], 10]
[error] GenServer #PID<0.374.0> terminating
** (stop) {:port_exit, :killed}
Last message: {#Port<0.53>, {:exit_status, 139}}
State: {:state, #Port<0.53>, {#PID<0.358.0>, #Reference<0.2075734545.4001366017.31751>}, #PID<0.358.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.49>, #Port<0.50>], #Port<0.56>, #Port<0.57>}

It looks like there's two things that need to happen to get at least nvarchar(max) working:

  • Get odbc:param_query to submitt params correclty
  • Decoding binary type (is this an erlang encoding?)

Does this sound right @arcusfelis & @jbachhardie?

cpursley avatar Sep 27 '18 18:09 cpursley

Hmm, I haven't tested the library with binary_strings option off. Generally, you really want to use binaries everywhere for Garbage Collector efficiency.

In MongooseIM we use these connection options:

eodbc:connect(Settings, [{scrollable_cursors, off},{binary_strings, on},{return_types, on}]) https://github.com/esl/MongooseIM/blob/295181f7e9fc5c42b58fdb212ee12be8237cf447/src/rdbms/mongoose_rdbms_odbc.erl

binary_strings=on - return char, nvarchar, binary, nbinary as binaries. Actually, the otp version of odbc library has the option too. But than we can't seamlessly do some processing of unicode fields because both binaries and unicode are now encoded using the same erlang type, binary. We use utf8 binaries as strings in MongooseIM, but MSSQL uses utf16, so some reencoding is required.

So, basically we have added the return_types option and now each query returns not only field names, but also field types. And our code can guess which transformations we should do based on ODBC field types. https://github.com/esl/MongooseIM/blob/295181f7e9fc5c42b58fdb212ee12be8237cf447/src/rdbms/mongoose_rdbms_odbc.erl#L105

On Thu, 27 Sep 2018 at 20:15, Cháse Pursłey [email protected] wrote:

Ok, I was able to get eodbc loaded by installing it via mix and swapping "odbc" for "eodbc" where appropriate in the mssqlex library. That approach works and it's pretty handy that you can install erlang rebar libraries easily via hex.

However, there are some esoteric errors coming up when running queries where the column type is either varchar(max) or nvarchar(max).

In iex:

{:ok, conn} = Mssqlex.ODBC.init(conn_str)

This works for nvarchar(max). However, the columns are returned as binary.

:eodbc.sql_query(conn, 'SELECT * FROM dbo.tblSomeTable AS st WHERE st.ID = 1') => {:selected, ['ID', 'SomeColumn'], [[1, <<72, 0, 52, 0, 115, 0, 73, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 50, 0, 49, 0, 89, 0, 87, ...>>] ]}

This does not work when varchar(max):

:eodbc.sql_query(conn, 'SELECT * FROM dbo.tblSomeTable AS st WHERE st.ID = 1') => [error] GenServer #PID<0.472.0> terminating ** (stop) {:port_exit, :killed} Last message: {#Port<0.102>, {:exit_status, 139}} State: {:state, #Port<0.102>, {#PID<0.389.0>, #Reference<0.425679474.514588673.206201>}, #PID<0.389.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.100>, #Port<0.101>], #Port<0.103>, #Port<0.104>}

And via Ecto/Mssqlex, neither varchar(max) nor nvarchar(max) work.

Error for nvarchar(max):

SomeTable |> Repo.get(1) => [error] [79, 68, 66, 67, 58, 32, 114, 101, 99, 101, 105, 118, 101, 100, 32, 117, 110, 101, 120, 112, 101, 99, 116, 101, 100, 32, 105, 110, 102, 111, 58, 32, [123, ['tcp_closed', 44, '#Port<0.62>'], 125], 10] [error] GenServer #PID<0.378.0> terminating ** (stop) {:port_exit, :could_not_bind_data_buffers} Last message: {#Port<0.58>, {:exit_status, 22}} State: {:state, #Port<0.58>, {#PID<0.359.0>, #Reference<0.1246316755.1851523073.245020>}, #PID<0.359.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.54>, #Port<0.55>], #Port<0.61>, #Port<0.62>}

  • The :could_not_bind_data_buffers error appears to be similar to this issue: yandex/clickhouse-odbc#27 https://github.com/yandex/clickhouse-odbc/issues/27
  • The corresponding exit code in eobcd: https://github.com/arcusfelis/eodbc/blob/748d613846731c3d776c8f22600d8b6d89c26435/src/eodbc_internal.hrl#L161

However, using Repo.query directly works for _nvarchar(max)) (but not varchar):

{:ok, result} = Repo.query("SELECT * FROM dbo.tblSomeTable AS st WHERE st.ID = 1")

Error for varchar(max):

SomeTable |> Repo.get(1)

[error] [79, 68, 66, 67, 58, 32, 114, 101, 99, 101, 105, 118, 101, 100, 32, 117, 110, 101, 120, 112, 101, 99, 116, 101, 100, 32, 105, 110, 102, 111, 58, 32, [123, ['tcp_closed', 44, '#Port<0.57>'], 125], 10] [error] GenServer #PID<0.374.0> terminating ** (stop) {:port_exit, :killed} Last message: {#Port<0.53>, {:exit_status, 139}} State: {:state, #Port<0.53>, {#PID<0.358.0>, #Reference<0.2075734545.4001366017.31751>}, #PID<0.358.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.49>, #Port<0.50>], #Port<0.56>, #Port<0.57>}

It looks like there's two things that need to happen to get at least nvarchar(max) working:

  • Get odbc:param_query to submitt params correclty
  • Decoding binary type

Does this sound right @arcusfelis https://github.com/arcusfelis & @jbachhardie https://github.com/jbachhardie?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/findmypast-oss/mssql_ecto/issues/2#issuecomment-425192423, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnDNOiSoOUDnUeH455RgFTq7a3WksOvks5ufRW_gaJpZM4Mmltp .

-- С уважением, Уваров Михаил. Best regards, Uvarov Michael

arcusfelis avatar Sep 28 '18 11:09 arcusfelis