node-pg-query-stream icon indicating copy to clipboard operation
node-pg-query-stream copied to clipboard

how do i get this to work with pg@7?

Open jonathanong opened this issue 8 years ago • 12 comments

seems like pg@7 always returns a promise now, and the promise doesn't seem to resolve for me

jonathanong avatar Jul 24 '17 01:07 jonathanong

Hey @jonathanong - could you include a sample of code that reproduces the issue? I'd love to take a look at it for you.

brianc avatar Jul 24 '17 17:07 brianc

using a new file using npm i --save pg pg-query-stream:

const QueryStream = require('pg-query-stream')
const { Pool } = require('pg')

const client = new Pool()

const result = client.query(new QueryStream('SELECT * FROM users'))

console.log(typeof result.then === 'function') // returns `true`

Promise.resolve(result).then((stream) => {
  stream.on('data', x => console.log(x))
  .on('error', e => console.error(e.stack))
  .on('end', () => {
    console.log('finished')
    process.exit(1)
  })
}).catch(e => {
  console.error(e.stack)
  process.exit(1)
})

it never exits or logs anything except true.

jonathanong avatar Jul 26 '17 04:07 jonathanong

Would be nice to see this resolved soon! 😄

qas avatar Aug 03 '17 19:08 qas

@jonathanong it shouldn't return Promise in the first place, it should return stream.

Suor avatar Aug 04 '17 12:08 Suor

Looks like this is Pool.query() fault, it doesn't care about handling submittable.

Suor avatar Aug 04 '17 12:08 Suor

Yah sounds like an oversight on my part w/ pg-pool. Sorry about that!

brianc avatar Aug 04 '17 14:08 brianc

For now, use pool.connect to acquire a client, then pass the stream to the client.query method.

const client = await pool.connect()
const stream = client.query(new QueryStream('select *'))
stream.on('end', client.release)
stream.pipe(res) // ..etc

pool.query has slightly different semantics in that it has to first establish a connection to the backend which could throw an error before your submittable is even dispatched. I need to do some re-working on the internals of pg-query-stream and pg-cursor to support pool.query and I don't have the time right away.

If you or your company benefit from node-postgres and have the means, please consider supporting my work on Patreon.

brianc avatar Aug 07 '17 02:08 brianc

whoops didn't mean to close this

brianc avatar Aug 07 '17 02:08 brianc

In case anybody wants a full snippet:

(async function() {
  var pool = new pg.Pool({
    user: 'brandon',
    host: '192.168.0.2',
    database: 'db',
    password: '',
    port: 5432,
  });

  var client = await pool.connect();

  var app = express();

  app.get('/', function(req, res) {
    console.log(new Date(), 'Starting...');

    var start = now();

    var stream = client.query(new QueryStream(sql));

    stream.on('end', function() {
      var end = now();

      console.log(new Date(), 'Finished...', end - start);

      client.release();
    });

    stream.pipe(JSONStream.stringify()).pipe(res);
  });

  app.listen(5000, function() {
    console.log('Listening...');
  });
})();

I'm also a little confused on the proposed performance benefits for this package.

I ran the above code against this code:

(async function() {
  var pool = new pg.Pool({
    user: 'brandon',
    host: '192.168.0.2',
    database: 'db',
    password: '',
    port: 5432,
  });

  var client = await pool.connect();

  var app = express();

  app.get('/', function(req, res) {
    console.log(new Date(), 'Starting...');

    var start = now();
    
    client.query(sql, function(err, sqlRes) {
      res.send(JSON.stringify(sqlRes.rows));

      var end = now();

      console.log(new Date(), 'Finished...', end - start);
    });
  });

  app.listen(5000, function() {
    console.log('Listening...');
  });
})();

and I did not see any performance benefits.

Correct me if I am mistaken, but I thought the idea behind this package (and streams in general) would be that the client could receive the first byte of the response in a much quicker fashion.

In the non-stream version, the code has to get all resulting rows for the query, stringify them, then send it.

In the stream-version, the code should be streaming the rows, while stringifying on the fly, to the client.

brandon@foo:~$ time curl http://localhost:5000/ -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2397k    0 2397k    0     0  1609k      0 --:--:--  0:00:01 --:--:-- 1610k

real	0m1.501s
user	0m0.016s
sys	0m0.008s
brandon@foo:~$ time curl http://localhost:5000/ -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2390k  100 2390k    0     0  1682k      0  0:00:01  0:00:01 --:--:-- 1683k

real	0m1.432s
user	0m0.008s
sys	0m0.004s

The differences I see are negligible, and if anything, the streaming version is slightly slower (which would make sense, given the output)

I thought the semi-massive 2.5mb query response I was getting would be perfect for this streaming package... but, maybe 3.7k rows isn't enough?

brandonros avatar Jan 29 '18 00:01 brandonros

@brandonros processing a stream can give you the performance benefits you talk about. For example, analyzing the twitter firehose. You'd want to process them as they come in. But I think that the primary purpose of streaming is to give your server a constant memory footprint regardless of the amount of data being processed.

If you load all of the results before returning, your memory consumption will grow linearly. For some large result set you will exceed the memory available to your application and you'll start seeing problems; app crash, disk thrashing, etc.

If instead, you stream the results through your server in constant batches, your memory consumption will remain constant for any sized result set.

Make sense?

ChrisKozak avatar Mar 22 '18 17:03 ChrisKozak

@brandonros https://github.com/brandonros processing a stream can give you the performance benefits you talk about.

Can processing a stream also give you negative performance benefits?

On Thu, Mar 22, 2018 at 1:34 PM, Chris Kozak [email protected] wrote:

@brandonros https://github.com/brandonros processing a stream can give you the performance benefits you talk about. For example, analyzing the twitter firehose. You'd want to process them as they come in. But I think that the primary purpose of streaming is to give your server a constant memory footprint regardless of the amount of data being processed.

If you load all of the results before returning, your memory consumption will grow linearly. For some large result set you will exceed the memory available to your application and you'll start seeing problems; app crash, disk thrashing, etc.

If instead, you stream the results through your server in constant batches, your memory consumption will remain constant for any sized result set.

Make sense?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/brianc/node-pg-query-stream/issues/28#issuecomment-375393320, or mute the thread https://github.com/notifications/unsubscribe-auth/AIiQlhB5ghQuc-15YCRWwrAEXS_1x48gks5tg-CLgaJpZM4Ogo8h .

brandonros avatar Mar 26 '18 03:03 brandonros

in terms of performance benefits, there is always a trade off. In this case, if you had 1million rows, you would have to load that 1 million row in memory. With a stream, you could load only 10,000 rows at a time. With this approach however, you do have a trade off of keeping the stream up and present @brandonros

nguyenhmp avatar Dec 20 '18 23:12 nguyenhmp