tds_fdw
tds_fdw copied to clipboard
Gentle suggestions on option defaults:
This could purely be construed as a purely documentation issue, but there are a couple of default behaviors that could arguably be changed:
- server msg_handler changed to 'notice'
- table row_estimate_method changed to 'showplan_all'
#2 is IMO is kind of a big deal, it's not obvious from the description that the server side query is executed twice, and there are many scenarios where this would not be the expectation IMO. If you did leave the default, it would be nice to have the underlying action be explained, perhaps in the debug NOTICE output and the docs. My 0.02$.
These are not complaints; your library really saved my bacon; the previous way I was doing cross database querying (pl/sh to sqsh) is not stable, and so for this library is working great. ty!
Hi @merlinm,
Thanks for the suggestions!
- server msg_handler changed to 'notice'
The notice
behavior used to be the default behavior. As far as I could tell, the number of generated messages was excessive when everything was running properly. A user flagged this as a problem in issue #22, so I disabled it by default. I figured that people could set it to notice
if they needed to do some debugging. Maybe the default behavior should be changed back to notice
. I would be interested to know if anyone else has any thoughts on this. @juliogonzalez, @jcarnu , and @TheRevenantStar , do you guys have any thoughts?
- table row_estimate_method changed to 'showplan_all'
This is doable, but it is slightly complicated more complicated than you might think.
The showplan_all
value for row_estimate_method
is only valid for Microsoft SQL Server, but tds_fdw
also supports Sybase servers.
tds_fdw
does not currently have an efficient row_estimate_method
for Sybase servers. That feature request is currently being tracked by issue #47.
It is technically possible for tds_fdw
to only use showplan_all
by default for Microsoft SQL Server, but how would tds_fdw
know whether it is speaking to Microsoft SQL Server or Sybase? tds_fdw
's implementation of IMPORT FOREIGN SCHEMA
has some code that automatically determines the server type:
https://github.com/tds-fdw/tds_fdw/blob/26f1a0cc3538f1df54d47e82115be2c4d83a9629/src/tds_fdw.c#L3817
That code could be reused for this too. This automatic check adds an additional lag to the query's execution time in the form of at least 1 RTT between the remote server and the local PostgreSQL Server. This is probably fine for IMPORT FOREIGN SCHEMA
, which is only executed occasionally. For SELECT
statements that are executed much more frequently, this behavior would not be ideal , but it would probably still be better than executing the query twice by default on Microsoft SQL Server.
It may be possible to only perform the check occasionally and cache the server type somewhere, but that would complicate the implementation.
I will probably not get a chance to work on this, but if anyone else wants to take a stab at it and submit a PR, please feel free. The above check would probably have to go in tdsGetRowCount()
:
https://github.com/tds-fdw/tds_fdw/blob/26f1a0cc3538f1df54d47e82115be2c4d83a9629/src/tds_fdw.c#L1029
The notice behavior used to be the default behavior.
I see. I can understand the tradeoff here. I can't help but wonder if there's some way to reliably pair the SQL error message with the TDS raised error. This is essentially what is wanted. If that could be done, there would be no reason to change this setting for normal reasons IMO and it could then be positioned as a debug setting.
is is doable, but it is slightly complicated more complicated than you might think.
Maybe so, but the default behavior of running the query twice unreasonable; it can lead to poor superficial evaluations of tds_fdw performance (I almost overlooked the library until I started digging) and I wonder if there are scenarios where volatile type evaluations might be surprisingly employed.
Anything that does this ought be both non-default and documented IMO. As to what the default behavior should be, I can see that there currently no obvious options. Maybe constructing a new option, 'simple', which estimates 1000 rows could be the default, with other options advertised describing the various tradeoffs (extra round trips, query executions, etc) that you pay in exchange for better estimates.
In lieu of any technical change, better docs describing the behavior might be a good idea.