tds_fdw icon indicating copy to clipboard operation
tds_fdw copied to clipboard

Gentle suggestions on option defaults:

Open merlinm opened this issue 4 years ago • 2 comments

This could purely be construed as a purely documentation issue, but there are a couple of default behaviors that could arguably be changed:

  1. server msg_handler changed to 'notice'
  2. table row_estimate_method changed to 'showplan_all'

#2 is IMO is kind of a big deal, it's not obvious from the description that the server side query is executed twice, and there are many scenarios where this would not be the expectation IMO. If you did leave the default, it would be nice to have the underlying action be explained, perhaps in the debug NOTICE output and the docs. My 0.02$.

These are not complaints; your library really saved my bacon; the previous way I was doing cross database querying (pl/sh to sqsh) is not stable, and so for this library is working great. ty!

merlinm avatar Feb 06 '20 21:02 merlinm

Hi @merlinm,

Thanks for the suggestions!

  1. server msg_handler changed to 'notice'

The notice behavior used to be the default behavior. As far as I could tell, the number of generated messages was excessive when everything was running properly. A user flagged this as a problem in issue #22, so I disabled it by default. I figured that people could set it to notice if they needed to do some debugging. Maybe the default behavior should be changed back to notice. I would be interested to know if anyone else has any thoughts on this. @juliogonzalez, @jcarnu , and @TheRevenantStar , do you guys have any thoughts?

  1. table row_estimate_method changed to 'showplan_all'

This is doable, but it is slightly complicated more complicated than you might think.

The showplan_all value for row_estimate_method is only valid for Microsoft SQL Server, but tds_fdw also supports Sybase servers.

tds_fdw does not currently have an efficient row_estimate_method for Sybase servers. That feature request is currently being tracked by issue #47.

It is technically possible for tds_fdw to only use showplan_all by default for Microsoft SQL Server, but how would tds_fdw know whether it is speaking to Microsoft SQL Server or Sybase? tds_fdw's implementation of IMPORT FOREIGN SCHEMA has some code that automatically determines the server type:

https://github.com/tds-fdw/tds_fdw/blob/26f1a0cc3538f1df54d47e82115be2c4d83a9629/src/tds_fdw.c#L3817

That code could be reused for this too. This automatic check adds an additional lag to the query's execution time in the form of at least 1 RTT between the remote server and the local PostgreSQL Server. This is probably fine for IMPORT FOREIGN SCHEMA, which is only executed occasionally. For SELECT statements that are executed much more frequently, this behavior would not be ideal , but it would probably still be better than executing the query twice by default on Microsoft SQL Server.

It may be possible to only perform the check occasionally and cache the server type somewhere, but that would complicate the implementation.

I will probably not get a chance to work on this, but if anyone else wants to take a stab at it and submit a PR, please feel free. The above check would probably have to go in tdsGetRowCount():

https://github.com/tds-fdw/tds_fdw/blob/26f1a0cc3538f1df54d47e82115be2c4d83a9629/src/tds_fdw.c#L1029

GeoffMontee avatar Feb 11 '20 08:02 GeoffMontee

The notice behavior used to be the default behavior.

I see. I can understand the tradeoff here. I can't help but wonder if there's some way to reliably pair the SQL error message with the TDS raised error. This is essentially what is wanted. If that could be done, there would be no reason to change this setting for normal reasons IMO and it could then be positioned as a debug setting.

is is doable, but it is slightly complicated more complicated than you might think.

Maybe so, but the default behavior of running the query twice unreasonable; it can lead to poor superficial evaluations of tds_fdw performance (I almost overlooked the library until I started digging) and I wonder if there are scenarios where volatile type evaluations might be surprisingly employed.

Anything that does this ought be both non-default and documented IMO. As to what the default behavior should be, I can see that there currently no obvious options. Maybe constructing a new option, 'simple', which estimates 1000 rows could be the default, with other options advertised describing the various tradeoffs (extra round trips, query executions, etc) that you pay in exchange for better estimates.

In lieu of any technical change, better docs describing the behavior might be a good idea.

merlinm avatar Feb 11 '20 14:02 merlinm