node-pg-query-stream
node-pg-query-stream copied to clipboard
how do i get this to work with pg@7?
seems like pg@7 always returns a promise now, and the promise doesn't seem to resolve for me
Hey @jonathanong - could you include a sample of code that reproduces the issue? I'd love to take a look at it for you.
using a new file using npm i --save pg pg-query-stream:
const QueryStream = require('pg-query-stream')
const { Pool } = require('pg')
const client = new Pool()
const result = client.query(new QueryStream('SELECT * FROM users'))
console.log(typeof result.then === 'function') // returns `true`
Promise.resolve(result).then((stream) => {
stream.on('data', x => console.log(x))
.on('error', e => console.error(e.stack))
.on('end', () => {
console.log('finished')
process.exit(1)
})
}).catch(e => {
console.error(e.stack)
process.exit(1)
})
it never exits or logs anything except true.
Would be nice to see this resolved soon! 😄
@jonathanong it shouldn't return Promise in the first place, it should return stream.
Looks like this is Pool.query() fault, it doesn't care about handling submittable.
Yah sounds like an oversight on my part w/ pg-pool. Sorry about that!
For now, use pool.connect to acquire a client, then pass the stream to the client.query method.
const client = await pool.connect()
const stream = client.query(new QueryStream('select *'))
stream.on('end', client.release)
stream.pipe(res) // ..etc
pool.query has slightly different semantics in that it has to first establish a connection to the backend which could throw an error before your submittable is even dispatched. I need to do some re-working on the internals of pg-query-stream and pg-cursor to support pool.query and I don't have the time right away.
If you or your company benefit from node-postgres and have the means, please consider supporting my work on Patreon.
whoops didn't mean to close this
In case anybody wants a full snippet:
(async function() {
var pool = new pg.Pool({
user: 'brandon',
host: '192.168.0.2',
database: 'db',
password: '',
port: 5432,
});
var client = await pool.connect();
var app = express();
app.get('/', function(req, res) {
console.log(new Date(), 'Starting...');
var start = now();
var stream = client.query(new QueryStream(sql));
stream.on('end', function() {
var end = now();
console.log(new Date(), 'Finished...', end - start);
client.release();
});
stream.pipe(JSONStream.stringify()).pipe(res);
});
app.listen(5000, function() {
console.log('Listening...');
});
})();
I'm also a little confused on the proposed performance benefits for this package.
I ran the above code against this code:
(async function() {
var pool = new pg.Pool({
user: 'brandon',
host: '192.168.0.2',
database: 'db',
password: '',
port: 5432,
});
var client = await pool.connect();
var app = express();
app.get('/', function(req, res) {
console.log(new Date(), 'Starting...');
var start = now();
client.query(sql, function(err, sqlRes) {
res.send(JSON.stringify(sqlRes.rows));
var end = now();
console.log(new Date(), 'Finished...', end - start);
});
});
app.listen(5000, function() {
console.log('Listening...');
});
})();
and I did not see any performance benefits.
Correct me if I am mistaken, but I thought the idea behind this package (and streams in general) would be that the client could receive the first byte of the response in a much quicker fashion.
In the non-stream version, the code has to get all resulting rows for the query, stringify them, then send it.
In the stream-version, the code should be streaming the rows, while stringifying on the fly, to the client.
brandon@foo:~$ time curl http://localhost:5000/ -o /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2397k 0 2397k 0 0 1609k 0 --:--:-- 0:00:01 --:--:-- 1610k
real 0m1.501s
user 0m0.016s
sys 0m0.008s
brandon@foo:~$ time curl http://localhost:5000/ -o /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2390k 100 2390k 0 0 1682k 0 0:00:01 0:00:01 --:--:-- 1683k
real 0m1.432s
user 0m0.008s
sys 0m0.004s
The differences I see are negligible, and if anything, the streaming version is slightly slower (which would make sense, given the output)
I thought the semi-massive 2.5mb query response I was getting would be perfect for this streaming package... but, maybe 3.7k rows isn't enough?
@brandonros processing a stream can give you the performance benefits you talk about. For example, analyzing the twitter firehose. You'd want to process them as they come in. But I think that the primary purpose of streaming is to give your server a constant memory footprint regardless of the amount of data being processed.
If you load all of the results before returning, your memory consumption will grow linearly. For some large result set you will exceed the memory available to your application and you'll start seeing problems; app crash, disk thrashing, etc.
If instead, you stream the results through your server in constant batches, your memory consumption will remain constant for any sized result set.
Make sense?
@brandonros https://github.com/brandonros processing a stream can give you the performance benefits you talk about.
Can processing a stream also give you negative performance benefits?
On Thu, Mar 22, 2018 at 1:34 PM, Chris Kozak [email protected] wrote:
@brandonros https://github.com/brandonros processing a stream can give you the performance benefits you talk about. For example, analyzing the twitter firehose. You'd want to process them as they come in. But I think that the primary purpose of streaming is to give your server a constant memory footprint regardless of the amount of data being processed.
If you load all of the results before returning, your memory consumption will grow linearly. For some large result set you will exceed the memory available to your application and you'll start seeing problems; app crash, disk thrashing, etc.
If instead, you stream the results through your server in constant batches, your memory consumption will remain constant for any sized result set.
Make sense?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/brianc/node-pg-query-stream/issues/28#issuecomment-375393320, or mute the thread https://github.com/notifications/unsubscribe-auth/AIiQlhB5ghQuc-15YCRWwrAEXS_1x48gks5tg-CLgaJpZM4Ogo8h .
in terms of performance benefits, there is always a trade off. In this case, if you had 1million rows, you would have to load that 1 million row in memory. With a stream, you could load only 10,000 rows at a time. With this approach however, you do have a trade off of keeping the stream up and present @brandonros