fatfree icon indicating copy to clipboard operation
fatfree copied to clipboard

(suggested enhancement) better handling of large result sets

Open slifin opened this issue 10 years ago • 15 comments

This maybe a naive post but I was reading up on 5.5 generators in PHP it seems like it allows for memory efficient arrays at the moment you cannot call a large query with find() or exec() without breaching memory limits, but with old legacy code you could say : while($row = mysql_fetch_assoc($rs))

that would only load one row at a time in memory, I believe the same thing can be achieved with "a class that implements the Iterator interface" quoted from: http://www.php.net//manual/en/language.generators.overview.php

so that f3 can handle larger result sets

slifin avatar Jun 25 '14 23:06 slifin

:+1: but remember the cursor is related to many kinds of dbms, some of them may be not "iterable". Could be enough to provide an additional param in DB->exec to distinct between $query->fetchall and the requested $query->fetch on the PDO Query and modify next / previous methods to check if the resultset is iterable and call again $query->fetch.

KOTRET avatar Jun 26 '14 05:06 KOTRET

So, I think Cursor needs to be re-written significantly to utilize the data store more efficiently, in order to achieve @slifin's suggestion. In the meantime, I think it would be worthwhile if Cursor implemented Iterator for future reference.

sgpinkus avatar Oct 22 '14 09:10 sgpinkus

Please provide stats to support the claim that rewriting the code will be more efficient that way. Benchmarks would be useful.

bcosca avatar Oct 22 '14 15:10 bcosca

@bcosca: guess its subjective because this is about convenience. but well - a before / after comparison of source would be welcome

KOTRET avatar Oct 23 '14 05:10 KOTRET

I pushed another commit to dev making Cursor Traversable and tied it to this after searching for tickets on "iterator" support. But on review, just being Iterable, and this ticket are not related. So please disregard that ref I made.

Regarding efficiency, and what I assume is the jist of the original ticket; Looking over the \DB code, it seems like the \DB\SQL PDO wrapper which \DB\Cursor will call upon to fetch result is just doing a fetchAll(), returning an array of data to \DB\Cursor which stores it internally (I could be wrong as it gets a bit complicated and haven't confirm that). Provided the SQL PDO driver can incrementally fetch result from the database (which it should do), it's going to be potentially more memory efficient and bandwidth efficient to not use fetchAll(), and iterate over the results. You would probably want to iterate over the results from \DB\Cursor. Changing the existing code to do so, as I said, is probably a bit complicated.

As for empirical evidence regarding fetch() versus fetchAll(). A quick google yields this link which show memory efficiency diffs https://stackoverflow.com/questions/2770630/pdofetchall-vs-pdofetch-in-a-loop.

sgpinkus avatar Oct 23 '14 08:10 sgpinkus

memory: yes, bandwidth: no - packetoverhead on each request. the benchmark is not really complete as the requests going to localhost and therefore the network adapter does a loopback (so in fact no packets are sent over network).

KOTRET avatar Oct 23 '14 10:10 KOTRET

Did you read the comments made to that topic on stackoverflow? The marked answer is faulty/incomplete and the other results are all too different to call any way as the winner. As long as you are going to store the data from each iteration somewhere, maybe for displaying it in a template like you usually do, the memory consumption doesn't show any difference... you just add packetoverhead as stated from @KOTRET. The memory reduction only takes place if you use a mapper for somethink like a mass-update, where you would probably more likely use a raw SQL UPDATE query for that purpose or a DB-tool. Furthermore i'm not sure if adding more inheritance from parent classes to the Cursor could also become a bottleneck. You can already traverse using load/next/dry methods or find. Isn't that convenient already?

ikkez avatar Oct 23 '14 12:10 ikkez

Yeah I just grabbed the first link off Google without looking that hard. Its really not that controversial? The difference between fetch() and fetchAll() is straight forward. Currently the DB layer only supports fetchAll(). I'm not saying it should not support fetchAll(). I'm saying there is an argument to be made that it should support fetch() / cursor based iteration. The original motive was "breaching memory limits", which seems to hold. Note, I'm not really pushing this ticket either. memory limit is not a problem for me personally.

"You can already traverse using load/next/dry methods or find. Isn't that convenient already?"

Well, this is off topic (see #709), but I'd prefer it to be Traversable, so I can iterate over other things and \DB\Cursor the same way.

sgpinkus avatar Oct 23 '14 12:10 sgpinkus

I made some tests and modified the code to use fetch for load method. It has been found out that there are several side effects that come along. It seems like with fetch you can always just fetch the next record, moving one record forward. There's no moving backwards or to skip records the Cursor->skip method should provide. The PDOStatement object needs to be passed to the mapper class, which graps the next record when you move forward, therefore it needs to be open. This leads to the issue, where to close the cursor. The closecursor method must only be called when you're done with the iteration, or break an iteration and start a new query. This could be problematic when you pass the hydrated mapper into the hive for being processed in the template. You're not able to load another mapper as long as you are not finished with the predecessor. This leads to the next thing, that can't load another mapper within an iteration. You also can't iterate multiple times over the cursor. It needs to be closed and executed again at the end. I'll stop exploring the mapper-fetch topic. But i could imagine that it's not that hard and more fatfree to provide a simple DB\SQL->fetch method where you can assign a callback function being executed each iteration.

ikkez avatar Oct 24 '14 06:10 ikkez

+1 for fetch with callback and a lot of caution

bcosca avatar Oct 24 '14 06:10 bcosca

+1

MINORITYmaN avatar Oct 24 '14 07:10 MINORITYmaN

@ikkez Regarding one of the issues you hit; To be able to navigate a MySQL cursor back and to absolute addresses as would be required by skip() you need to set PDO::ATTR_CURSOR => PDO::CURSOR_SCROLL and on prepared queries. But this doesn't even work for me with the MySQL driver (I could be doing something wrong though - not that experienced with PDO). I.e. the driver it just ignores me. Here is some test code that would theoretically allow skip(), to work on MySQL, if it did what it was supposed to.

$dbh = new PDO('mysql:host=localhost;dbname=test', 'user', 'pass');
$dbh->exec('create table if not exists foo (x integer primary key)');
$dbh->exec('delete from test.foo');
$dbh->exec('insert into foo values (2),(3),(4),(5),(6),(7),(8)');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// Required??:
//$dbh->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, true);
$stmt = $dbh->prepare("select * from foo", [PDO::ATTR_CURSOR => PDO::CURSOR_SCROLL]);
$success = $stmt->execute();
var_dump($success);
$res = $stmt->fetch(PDO::FETCH_ASSOC, PDO::FETCH_ORI_ABS, 3);
var_dump($res);
$res = $stmt->fetch(PDO::FETCH_ASSOC, PDO::FETCH_ORI_NEXT, 0);
var_dump($res);
$res = $stmt->fetch(PDO::FETCH_ASSOC, PDO::FETCH_ORI_REL, -3);
var_dump($res);

sgpinkus avatar Oct 24 '14 07:10 sgpinkus

yeah i already saw the PDO::CURSOR_SCROLL option. But this is a driver specific thing. Seems like mysql and sqlite doesn't have support for that, but pgsql has.

This is because cursors are only supported in instance where the underlying database provides such functionality. In the case of MySQL and SQLite there is no such support. PDO does not provide emulation for cursors. (https://bugs.php.net/bug.php?id=39310#1165199450)

ikkez avatar Oct 24 '14 08:10 ikkez

Pagination of the results into manageable chunks isn't a more efficient way of doing this?

bcosca avatar Oct 25 '14 18:10 bcosca

i would always use pagination or limited chunks and a cron job for process big data. Currently i have no need for that fetch thing, i was just playing around with it as a learning task for myself.

ikkez avatar Oct 25 '14 19:10 ikkez