tds_fdw
tds_fdw copied to clipboard
DISTINCT doesn't work for 2.0.0-alpha1
Query SELECT DISTINCT "column_name" FROM foreign_table;
doesn't work. This return duplicate entries in case of usage tds-fdw from branch 2.0.0-alpha1
. Branch 1.0.7
works as expected for the same query and the same environment.
My env:
- Ubuntu 14.04
- PostgreSQL 9.5
- MSSQL 2012
- freetds-common 0.91-5
The 2.0.0-alpha1
release is a few commits behind master
right now, so it is missing some fixes. Does master
have this problem for you? It seems to work as expected for me:
postgres=# select * from mssql_table;
id | col
----+-----
1 | 1
2 | 1
3 | 1
(3 rows)
postgres=# select distinct col from mssql_table;
WARNING: Table definition mismatch: Could not match local column id with column from foreign table
col
-----
1
(1 row)
Same problem in master
branch of tds_fdw (commit 3a803c5840b8cf69e1a725fb4377d0f95a53a4e9) with latest FreeTDS version (1.00).
I have foreign table:
CREATE FOREIGN TABLE wrc_ctrs (
"WMSLOCATIONID" varchar,
...)
SERVER mssql_server
OPTIONS (
query 'SELECT [WRKCTRTABLE].[WMSLOCATIONID],
...
FROM [WRKCTRTABLE]',
row_estimate_method 'execute');
And quite simple query:
SELECT DISTINCT "WMSLOCATIONID" AS wmslocationid
FROM wrc_ctrs
WHERE "WMSLOCATIONID" != '';
As result I receive:
wmslocationid
---------------
Механ
Новый цех
Механ
Новый цех
Механ
Гравировка
Механ
Гравировка
...
(126 rows)
Here is lots of duplications. But, resulted amount of rows is less than simple count on same query.
Moreover, for COUNT with DISTINCT all works as expected:
SELECT COUNT(DISTINCT "WMSLOCATIONID") AS wmslocationid
FROM wrc_ctrs
WHERE "WMSLOCATIONID" != '';
wmslocationid
---------------
18
Total count is much more:
SELECT COUNT("WMSLOCATIONID") AS wmslocationid
FROM wrc_ctrs
WHERE "WMSLOCATIONID" != '';
wmslocationid
---------------
2755
For branch 1.0.7
all above queries works correct.
Is this a "magic" or somehow related with locale? May you help me in this situation or problem can be somewhere else? I can provide logs and any other information about env if needed.
In all described cased env is absolutely identical (same PG and MSSQL, same tables, same data, same initialize options and configs of the FDW and so on). Differences only in tds_fdw and FreeTDS versions.
Hi @Loriowar,
I'm not entirely sure what is causing this problem. WHERE and column pushdowns aren't even performed when using the query
foreign table option.
Would you be willing to provide a data set and table definitions that can reproduce this problem? That would help a lot.
Sorry for a huge delay. I can't provide a full data set, only a table definition:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[WRKCTRTABLE](
[WRKCTRGROUPID] [nvarchar](10) NOT NULL,
[WRKCTRID] [nvarchar](10) NOT NULL,
[NAME] [nvarchar](140) NOT NULL,
[WRKCTRTYPE] [int] NOT NULL,
[WRKCTRNUMOF] [numeric](28, 12) NOT NULL,
[EFFECTIVITYPCT] [numeric](28, 12) NOT NULL,
[OPERATIONSCHEDPCT] [numeric](28, 12) NOT NULL,
[CAPACITY] [numeric](28, 12) NOT NULL,
[CAPUNIT] [int] NOT NULL,
[VENDID] [nvarchar](20) NOT NULL,
[ACCOUNTWRKCTRISSUE] [nvarchar](20) NOT NULL,
[ACCOUNTWIPVALUATION] [nvarchar](20) NOT NULL,
[CREATED] [datetime] NOT NULL,
[QUEUETIMEBEFORE] [numeric](28, 12) NOT NULL,
[SETUPTIME] [numeric](28, 12) NOT NULL,
[PROCESSTIME] [numeric](28, 12) NOT NULL,
[PROCESSPERQTY] [numeric](28, 12) NOT NULL,
[TRANSPTIME] [numeric](28, 12) NOT NULL,
[QUEUETIMEAFTER] [numeric](28, 12) NOT NULL,
[TRANSFERBATCH] [numeric](28, 12) NOT NULL,
[TOHOURS] [numeric](28, 12) NOT NULL,
[ERRORPCT] [numeric](28, 12) NOT NULL,
[SETUPCATEGORYID] [nvarchar](10) NOT NULL,
[PROCESSCATEGORYID] [nvarchar](10) NOT NULL,
[DIMENSION] [nvarchar](10) NOT NULL,
[DIMENSION2_] [nvarchar](10) NOT NULL,
[DIMENSION3_] [nvarchar](10) NOT NULL,
[ISGROUP] [int] NOT NULL,
[ACCOUNTWIPISSUE] [nvarchar](20) NOT NULL,
[PRODUNITID] [nvarchar](10) NOT NULL,
[SITEID] [nvarchar](10) NOT NULL,
[BOTTLENECKRESOURCE] [int] NOT NULL,
[WMSLOCATIONID] [nvarchar](10) NOT NULL,
[EMPLID] [nvarchar](20) NOT NULL,
[CALENDARID] [nvarchar](10) NOT NULL,
[CAPLIMITED] [int] NOT NULL,
[INVENTLOCATIONID] [nvarchar](10) NOT NULL,
[CAPACITYBATCH] [numeric](28, 12) NOT NULL,
[EXCLUSIVE] [int] NOT NULL,
[QTYCATEGORYID] [nvarchar](10) NOT NULL,
[ROUTEGROUPID] [nvarchar](10) NOT NULL,
[ACCOUNTWRKCTRISSUEOFFSET] [nvarchar](20) NOT NULL,
[DATAAREAID] [nvarchar](4) NOT NULL,
[RECVERSION] [int] NOT NULL,
[RECID] [bigint] NOT NULL,
[TASKGROUPID] [nvarchar](10) NOT NULL,
[WRKCTRTASKDEMAND] [numeric](28, 12) NOT NULL,
[PROPERTYLIMITED] [int] NOT NULL,
CONSTRAINT [I_266WRKCTRIDX] PRIMARY KEY NONCLUSTERED
(
[DATAAREAID] ASC,
[WRKCTRID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [WRKCTRGROUPID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [WRKCTRID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [NAME]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [WRKCTRTYPE]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [WRKCTRNUMOF]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [EFFECTIVITYPCT]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [OPERATIONSCHEDPCT]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [CAPACITY]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [CAPUNIT]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [VENDID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [ACCOUNTWRKCTRISSUE]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [ACCOUNTWIPVALUATION]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('1900-01-01 00:00:00.000') FOR [CREATED]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [QUEUETIMEBEFORE]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [SETUPTIME]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [PROCESSTIME]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [PROCESSPERQTY]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [TRANSPTIME]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [QUEUETIMEAFTER]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [TRANSFERBATCH]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [TOHOURS]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [ERRORPCT]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [SETUPCATEGORYID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [PROCESSCATEGORYID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [DIMENSION]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [DIMENSION2_]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [DIMENSION3_]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [ISGROUP]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [ACCOUNTWIPISSUE]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [PRODUNITID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [SITEID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [BOTTLENECKRESOURCE]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [WMSLOCATIONID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [EMPLID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [CALENDARID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [CAPLIMITED]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [INVENTLOCATIONID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [CAPACITYBATCH]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [EXCLUSIVE]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [QTYCATEGORYID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [ROUTEGROUPID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [ACCOUNTWRKCTRISSUEOFFSET]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('dat') FOR [DATAAREAID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((1)) FOR [RECVERSION]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ('') FOR [TASKGROUPID]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [WRKCTRTASKDEMAND]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] ADD DEFAULT ((0)) FOR [PROPERTYLIMITED]
GO
ALTER TABLE [dbo].[WRKCTRTABLE] WITH CHECK ADD CHECK (([RECID]<>(0)))
GO
Do you get a correct result if you rewrite the query in this way?
SELECT DISTINCT ON ("WMSLOCATIONID") COUNT(*) AS wmslocationid
FROM wrc_ctrs
WHERE "WMSLOCATIONID" != '';
This query still return duplications
SELECT DISTINCT ON ("WMSLOCATIONID") "WMSLOCATIONID" AS wmslocationid
FROM wrc_ctrs
WHERE "WMSLOCATIONID" != '';
Your variant has incorrect syntax for PG. I don't know how can work DISTINCT ON
and COUNT
in this query.
Apologies, I forgot the GROUP BY "WMSLOCATIONID"
at the end.
Should be:
SELECT DISTINCT ON ("WMSLOCATIONID") "WMSLOCATIONID" AS wmslocationid
FROM wrc_ctrs
WHERE "WMSLOCATIONID" != ''
GROUP BY "WMSLOCATIONID";
This query doesn't work. It return duplications too.
Since pushdowns are not working, try this query:
SELECT DISTINCT ON ("wmslocationid") FROM
(SELECT "WMSLOCATIONID" AS wmslocationid
FROM wrc_ctrs
WHERE "WMSLOCATIONID" != '') a
GROUP BY "WMSLOCATIONID";
Little changes due to postgres syntax:
SELECT DISTINCT ON ("wmslocationid") "wmslocationid" FROM
(SELECT "WMSLOCATIONID" AS wmslocationid
FROM wrc_ctrs
WHERE "WMSLOCATIONID" != '') a
GROUP BY "wmslocationid";
And I see a duplications in the result again.
An important question: must I update tds_fdw to the latest master commit? For now, all queries runs on tds_fdw compiled in the moment of commit 3a803c5840b8cf69e1a725fb4377d0f95a53a4e9.
When wrapping the query in a from clause to perform the SELECT DISTINCT, the handling of DISTINCT is no longer on tds_fdw or MSSQL, it's pure PostgreSQL at this point. if there are duplicates, check to make sure there isn't something about them that you can't see that makes them distinct, like spaces of other invisible characters. A last stitch effort is to use the query to fill a temporary table and run SELECT DISTINCT against it. If there are duplicates it's not tds_fdw, it's definitely something else. Try creating the table locally, fill it using a query for the raw data from mssql and then query this new table with distinct.
For now, I use cascade of materialized views. The first view is identical to the foreign table (wrc_ctrs) and it store data one to one without any modification. Second view store only distinct records from the first view. In this case all works perfect and there is no duplications. Other variant: I can install stable version of tds_fdw (version 1.0.7) and problem with distinct disappear. I can check your last variants of queries on tds_fdw with versions 1.0.8 and master. May this can help?
In a first time I thought about whitespace or something else in the column, but there is nothing. Other question: may this appear due to different encoding between databases or due to russian characters in column value? If so, how I can check this? And finally, is this a problem of tds_fdw or this is a problem of my local installation?
The russian characters might be the cause. @GeoffMontee would be able to explain better. I'm no good at encoding logic.
I'm having this same issue with the current head. In my case, it seems related to case-sensitivity.
I am using tds_version 7.4, freetds 1.00.26, SQL Server 2008 R2, and postgresql 9.3. The SQL Server database is using collation SQL_Latin1_General_CP1_CI_AS. The postgres database is using en_us.UTF-8 for LC_COLLATE and LC_CTYPE. The encoding is UTF8. I don't believe any of these settings are particularly exotic.
CREATE FOREIGN TABLE test_tds_fdw (
UserID int,
FirstName text,
LastName text,
UserName text,
Organization text,
Email text,
Type int,
Password text
)
SERVER test_server
OPTIONS (
table 'Users',
match_column_names 'true'
);
=> select distinct organization from test_tds_fdw where organization = 'self';
organization
--------------
Self
self
Self
self
Self
(5 rows)
=> select distinct organization from
(select distinct organization from test_tds_fdw where organization = 'self') tds_sub;
organization
--------------
self
Self
(2 rows)
=> select distinct organization from test_tds_fdw where organization ilike 'self';
ERROR: HV00L: DB-Library error: DB #: 20018, DB Msg: General SQL Server error: Check messages from the SQL Server, OS #: -1, OS Msg: , Level: 15
LOCATION: tds_err_handler, tds_fdw.c:3856
Please let me know if there's any other information I can provide. Unfortunately, I only have read access to the SQL Server, so there's a limited amount I can do on that end.