tds_fdw icon indicating copy to clipboard operation
tds_fdw copied to clipboard

DISTINCT doesn't work for 2.0.0-alpha1

Open Loriowar opened this issue 7 years ago • 15 comments

Query SELECT DISTINCT "column_name" FROM foreign_table; doesn't work. This return duplicate entries in case of usage tds-fdw from branch 2.0.0-alpha1. Branch 1.0.7 works as expected for the same query and the same environment.

My env:

  • Ubuntu 14.04
  • PostgreSQL 9.5
  • MSSQL 2012
  • freetds-common 0.91-5

Loriowar avatar Sep 25 '16 06:09 Loriowar

The 2.0.0-alpha1 release is a few commits behind master right now, so it is missing some fixes. Does master have this problem for you? It seems to work as expected for me:

postgres=# select * from mssql_table;
 id | col
----+-----
  1 |   1
  2 |   1
  3 |   1
(3 rows)

postgres=# select distinct col from mssql_table;
WARNING:  Table definition mismatch: Could not match local column id with column from foreign table
 col
-----
   1
(1 row)

GeoffMontee avatar Sep 26 '16 20:09 GeoffMontee

Same problem in master branch of tds_fdw (commit 3a803c5840b8cf69e1a725fb4377d0f95a53a4e9) with latest FreeTDS version (1.00).

I have foreign table:

CREATE FOREIGN TABLE wrc_ctrs (
    "WMSLOCATIONID" varchar,
    ...)
SERVER mssql_server
OPTIONS (
    query 'SELECT [WRKCTRTABLE].[WMSLOCATIONID],
                  ...
           FROM   [WRKCTRTABLE]',
    row_estimate_method 'execute');

And quite simple query:

SELECT DISTINCT "WMSLOCATIONID" AS wmslocationid
FROM  wrc_ctrs
WHERE "WMSLOCATIONID" != '';

As result I receive:

wmslocationid 
---------------
 Механ
 Новый цех
 Механ
 Новый цех
 Механ
 Гравировка
 Механ
 Гравировка
 ...

(126 rows)

Here is lots of duplications. But, resulted amount of rows is less than simple count on same query.

Moreover, for COUNT with DISTINCT all works as expected:

SELECT COUNT(DISTINCT "WMSLOCATIONID") AS wmslocationid
FROM  wrc_ctrs
WHERE "WMSLOCATIONID" != '';

wmslocationid 
---------------
            18

Total count is much more:

SELECT COUNT("WMSLOCATIONID") AS wmslocationid
FROM  wrc_ctrs
WHERE "WMSLOCATIONID" != '';

wmslocationid 
---------------
          2755

For branch 1.0.7 all above queries works correct.

Is this a "magic" or somehow related with locale? May you help me in this situation or problem can be somewhere else? I can provide logs and any other information about env if needed.

In all described cased env is absolutely identical (same PG and MSSQL, same tables, same data, same initialize options and configs of the FDW and so on). Differences only in tds_fdw and FreeTDS versions.

Loriowar avatar Sep 29 '16 09:09 Loriowar

Hi @Loriowar,

I'm not entirely sure what is causing this problem. WHERE and column pushdowns aren't even performed when using the query foreign table option.

Would you be willing to provide a data set and table definitions that can reproduce this problem? That would help a lot.

GeoffMontee avatar Nov 02 '16 17:11 GeoffMontee

Sorry for a huge delay. I can't provide a full data set, only a table definition:

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[WRKCTRTABLE](
  [WRKCTRGROUPID] [nvarchar](10) NOT NULL,
  [WRKCTRID] [nvarchar](10) NOT NULL,
  [NAME] [nvarchar](140) NOT NULL,
  [WRKCTRTYPE] [int] NOT NULL,
  [WRKCTRNUMOF] [numeric](28, 12) NOT NULL,
  [EFFECTIVITYPCT] [numeric](28, 12) NOT NULL,
  [OPERATIONSCHEDPCT] [numeric](28, 12) NOT NULL,
  [CAPACITY] [numeric](28, 12) NOT NULL,
  [CAPUNIT] [int] NOT NULL,
  [VENDID] [nvarchar](20) NOT NULL,
  [ACCOUNTWRKCTRISSUE] [nvarchar](20) NOT NULL,
  [ACCOUNTWIPVALUATION] [nvarchar](20) NOT NULL,
  [CREATED] [datetime] NOT NULL,
  [QUEUETIMEBEFORE] [numeric](28, 12) NOT NULL,
  [SETUPTIME] [numeric](28, 12) NOT NULL,
  [PROCESSTIME] [numeric](28, 12) NOT NULL,
  [PROCESSPERQTY] [numeric](28, 12) NOT NULL,
  [TRANSPTIME] [numeric](28, 12) NOT NULL,
  [QUEUETIMEAFTER] [numeric](28, 12) NOT NULL,
  [TRANSFERBATCH] [numeric](28, 12) NOT NULL,
  [TOHOURS] [numeric](28, 12) NOT NULL,
  [ERRORPCT] [numeric](28, 12) NOT NULL,
  [SETUPCATEGORYID] [nvarchar](10) NOT NULL,
  [PROCESSCATEGORYID] [nvarchar](10) NOT NULL,
  [DIMENSION] [nvarchar](10) NOT NULL,
  [DIMENSION2_] [nvarchar](10) NOT NULL,
  [DIMENSION3_] [nvarchar](10) NOT NULL,
  [ISGROUP] [int] NOT NULL,
  [ACCOUNTWIPISSUE] [nvarchar](20) NOT NULL,
  [PRODUNITID] [nvarchar](10) NOT NULL,
  [SITEID] [nvarchar](10) NOT NULL,
  [BOTTLENECKRESOURCE] [int] NOT NULL,
  [WMSLOCATIONID] [nvarchar](10) NOT NULL,
  [EMPLID] [nvarchar](20) NOT NULL,
  [CALENDARID] [nvarchar](10) NOT NULL,
  [CAPLIMITED] [int] NOT NULL,
  [INVENTLOCATIONID] [nvarchar](10) NOT NULL,
  [CAPACITYBATCH] [numeric](28, 12) NOT NULL,
  [EXCLUSIVE] [int] NOT NULL,
  [QTYCATEGORYID] [nvarchar](10) NOT NULL,
  [ROUTEGROUPID] [nvarchar](10) NOT NULL,
  [ACCOUNTWRKCTRISSUEOFFSET] [nvarchar](20) NOT NULL,
  [DATAAREAID] [nvarchar](4) NOT NULL,
  [RECVERSION] [int] NOT NULL,
  [RECID] [bigint] NOT NULL,
  [TASKGROUPID] [nvarchar](10) NOT NULL,
  [WRKCTRTASKDEMAND] [numeric](28, 12) NOT NULL,
  [PROPERTYLIMITED] [int] NOT NULL,
 CONSTRAINT [I_266WRKCTRIDX] PRIMARY KEY NONCLUSTERED 
(
  [DATAAREAID] ASC,
  [WRKCTRID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [WRKCTRGROUPID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [WRKCTRID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [NAME]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [WRKCTRTYPE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [WRKCTRNUMOF]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [EFFECTIVITYPCT]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [OPERATIONSCHEDPCT]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [CAPACITY]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [CAPUNIT]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [VENDID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [ACCOUNTWRKCTRISSUE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [ACCOUNTWIPVALUATION]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('1900-01-01 00:00:00.000') FOR [CREATED]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [QUEUETIMEBEFORE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [SETUPTIME]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [PROCESSTIME]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [PROCESSPERQTY]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [TRANSPTIME]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [QUEUETIMEAFTER]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [TRANSFERBATCH]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [TOHOURS]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [ERRORPCT]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [SETUPCATEGORYID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [PROCESSCATEGORYID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [DIMENSION]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [DIMENSION2_]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [DIMENSION3_]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [ISGROUP]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [ACCOUNTWIPISSUE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [PRODUNITID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [SITEID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [BOTTLENECKRESOURCE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [WMSLOCATIONID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [EMPLID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [CALENDARID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [CAPLIMITED]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [INVENTLOCATIONID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [CAPACITYBATCH]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [EXCLUSIVE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [QTYCATEGORYID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [ROUTEGROUPID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [ACCOUNTWRKCTRISSUEOFFSET]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('dat') FOR [DATAAREAID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((1)) FOR [RECVERSION]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [TASKGROUPID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [WRKCTRTASKDEMAND]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [PROPERTYLIMITED]
GO

ALTER TABLE [dbo].[WRKCTRTABLE]  WITH CHECK ADD CHECK  (([RECID]<>(0)))
GO

Loriowar avatar Dec 28 '16 12:12 Loriowar

Do you get a correct result if you rewrite the query in this way?

SELECT DISTINCT ON ("WMSLOCATIONID") COUNT(*) AS wmslocationid
FROM  wrc_ctrs
WHERE "WMSLOCATIONID" != '';

Smtgr14 avatar Dec 29 '16 15:12 Smtgr14

This query still return duplications

SELECT DISTINCT ON ("WMSLOCATIONID") "WMSLOCATIONID" AS wmslocationid
FROM  wrc_ctrs
WHERE "WMSLOCATIONID" != '';

Your variant has incorrect syntax for PG. I don't know how can work DISTINCT ON and COUNT in this query.

Loriowar avatar Jan 25 '17 13:01 Loriowar

Apologies, I forgot the GROUP BY "WMSLOCATIONID" at the end. Should be:

SELECT DISTINCT ON ("WMSLOCATIONID") "WMSLOCATIONID" AS wmslocationid
FROM wrc_ctrs
WHERE "WMSLOCATIONID" != ''
GROUP BY "WMSLOCATIONID";

Smtgr14 avatar Jan 26 '17 00:01 Smtgr14

This query doesn't work. It return duplications too.

Loriowar avatar Jan 26 '17 14:01 Loriowar

Since pushdowns are not working, try this query:

SELECT DISTINCT ON ("wmslocationid") FROM
  (SELECT "WMSLOCATIONID" AS wmslocationid
  FROM wrc_ctrs
  WHERE "WMSLOCATIONID" != '') a
GROUP BY "WMSLOCATIONID";

Smtgr14 avatar Jan 27 '17 01:01 Smtgr14

Little changes due to postgres syntax:

SELECT DISTINCT ON ("wmslocationid") "wmslocationid" FROM
   (SELECT "WMSLOCATIONID" AS wmslocationid
    FROM wrc_ctrs
    WHERE "WMSLOCATIONID" != '') a
GROUP BY "wmslocationid";

And I see a duplications in the result again.

Loriowar avatar Jan 27 '17 06:01 Loriowar

An important question: must I update tds_fdw to the latest master commit? For now, all queries runs on tds_fdw compiled in the moment of commit 3a803c5840b8cf69e1a725fb4377d0f95a53a4e9.

Loriowar avatar Jan 27 '17 06:01 Loriowar

When wrapping the query in a from clause to perform the SELECT DISTINCT, the handling of DISTINCT is no longer on tds_fdw or MSSQL, it's pure PostgreSQL at this point. if there are duplicates, check to make sure there isn't something about them that you can't see that makes them distinct, like spaces of other invisible characters. A last stitch effort is to use the query to fill a temporary table and run SELECT DISTINCT against it. If there are duplicates it's not tds_fdw, it's definitely something else. Try creating the table locally, fill it using a query for the raw data from mssql and then query this new table with distinct.

Smtgr14 avatar Jan 27 '17 16:01 Smtgr14

For now, I use cascade of materialized views. The first view is identical to the foreign table (wrc_ctrs) and it store data one to one without any modification. Second view store only distinct records from the first view. In this case all works perfect and there is no duplications. Other variant: I can install stable version of tds_fdw (version 1.0.7) and problem with distinct disappear. I can check your last variants of queries on tds_fdw with versions 1.0.8 and master. May this can help?

In a first time I thought about whitespace or something else in the column, but there is nothing. Other question: may this appear due to different encoding between databases or due to russian characters in column value? If so, how I can check this? And finally, is this a problem of tds_fdw or this is a problem of my local installation?

Loriowar avatar Jan 27 '17 18:01 Loriowar

The russian characters might be the cause. @GeoffMontee would be able to explain better. I'm no good at encoding logic.

Smtgr14 avatar Jan 27 '17 21:01 Smtgr14

I'm having this same issue with the current head. In my case, it seems related to case-sensitivity.

I am using tds_version 7.4, freetds 1.00.26, SQL Server 2008 R2, and postgresql 9.3. The SQL Server database is using collation SQL_Latin1_General_CP1_CI_AS. The postgres database is using en_us.UTF-8 for LC_COLLATE and LC_CTYPE. The encoding is UTF8. I don't believe any of these settings are particularly exotic.

 CREATE FOREIGN TABLE test_tds_fdw (
    UserID int,
    FirstName text,
    LastName text,
    UserName text,
    Organization text,
    Email text,
    Type int,
    Password text
)
SERVER test_server
OPTIONS (
    table 'Users',
    match_column_names 'true'
);

=> select distinct organization from test_tds_fdw where organization = 'self';
 organization 
--------------
 Self
 self
 Self
 self
 Self
(5 rows)

=> select distinct organization from 
    (select distinct organization from test_tds_fdw where organization = 'self') tds_sub;
 organization 
--------------
 self
 Self
(2 rows)

=> select distinct organization from test_tds_fdw where organization ilike 'self';
ERROR:  HV00L: DB-Library error: DB #: 20018, DB Msg: General SQL Server error: Check messages from the SQL Server, OS #: -1, OS Msg: , Level: 15
LOCATION:  tds_err_handler, tds_fdw.c:3856

Please let me know if there's any other information I can provide. Unfortunately, I only have read access to the SQL Server, so there's a limited amount I can do on that end.

smithje avatar Mar 02 '17 14:03 smithje