orator
orator copied to clipboard
Model chunk uses "select * from table"
I have a table with > 10m rows so this fills the ram then fails
Also could not find any cursor options in the docs unless i'm blind?
chunk is ran on the collection that is returned from the result, not the query itself
I'll need to see the code you are using but you might be after something like:
Model.limit(100).get().chunk(10)
This is what I feared, how would you go about iterating over a 10million row table?
I've created a workaround myself but looking for best practices
first = Character.order_by('id', 'asc').first()
last_id = first.id
while True:
characters = Character.order_by('id', 'asc').where('id', '>', last_id).take(100).get()
for character in characters:
pass
last_id = characters[-1].id
I'm pretty sure Laravels Eloquent pulls the rows dynamically rather than all then iterate
are you trying to create a generator here?
are you saying you don't want to store 10M rows in memory?
or you don't want to fetch all 10M rows from the database?
Chunking the result would imply that you're pulling n rows at a time from the database, that's what I think personally
Model.chunk(100) would pull 100 rows, but you can loop through that and offset by the rows
select * from table
{query here if exists, ie where x = y}
order by {primary key} asc
limit 100 offset ({iterator}-1)*100
when you reach the end (0 results) you just break out and chunk becomes None
I don't know if i'm explaining it well but with this method you wouldn't lock the application up while you iterate over every entry in a large table
My way would start processing rows immediately The way it works at the moment (select * from table;) would have to wait 10minutes to pull the entire table down into memory (if you have enough)
Is there anyway I can get a copy of a 10M row database? I'm wondering if a generator would work here
I'd prefer not to send a 4GB sql file, not only would it take forever to import but it's just way easier to generate a new table and seed it with data?
Doesn't Orator support seeding?
have you tried using a generator, yet?
def func():
yield Model.all().chunk(2)
this won't load all 10M rows into memory since it's a generator
Sorry for delay, this still calls select * from table
ID USER HOST DB COMMAND TIME STATE Info TIME_MS
787187 evequi localhost evequi Query 5 Sending data SELECT * FROM `characters` 5005.429
def testing():
yield Character.all().chunk(100)
if __name__ == "__main__":
for characters in testing():
print(characters)
hmm idk. Try
for characters in Character.chunk(100):
print(characters)
unless you said you tried that already? If the above code snippet doesn't work then Im not sure orator supports what you are trying to do
all() will execute a SELECT * FROM table and return a Collection object.
What you want is actually:
for characters in Character.order_by('id', 'asc').chunk(100):
# Do something with the characters
See the documentation: https://orator-orm.com/docs/0.9/orm.html#chunking-results and https://orator-orm.com/docs/0.9/query_builder.html#chunking-results-from-a-table
We've gone full circle
for characters in Character.order_by('id', 'asc').chunk(100):
print(characters)
this runs a SELECT * FROM characters, once again loading all 10m rows into memory before chunking
but you are incorrect because that returns a generator. Generators do not load everything into memory. They pull results as they are needed and as you are iterating over them. That is why they are called generators because they generate the results as they are needed.
this code section:
for characters in Character.order_by('id', 'asc').chunk(100):
print(characters)
does not load 10M rows into memory. It pulls 10M rows from the database but stores it in a generator which is not stored in memory.
https://wiki.python.org/moin/Generators
You can close this if you what, but as it stands the chunk helper is absolutely useless, you say it doesn't get stored in memory which is insane that you even think that
https://i.imgur.com/g1XFWx8.png
You need to re implement chunking or a cursor for ways to iterate over a database WITHOUT consuming the entire table to memory, it took 5 minutes to get that screenshot, if we did this in laravel it would take milliseconds
Again, I'm done trying to help from here and good luck but closing an issue because you can't be bothered to fix it is stupid
And I quote:
Chunking Results
If you need to process a lot of records, you can use the chunk method to avoid consuming a lot of RAM:
a generator doesn't necessarily store anything (relatively) into memory. Storing something like the results of 52! takes up like 500 bytes of memory. The problem here might be that getting those values into the generator in the first place is taking up a lot of memory. Any problems can likely be solved in here https://github.com/sdispater/orator/blob/68266dd62b795a34c85843dd902883ff7fc50d93/orator/orm/builder.py#L225
@sdispater and I will gladly look at any pull requests fixing this issue.
@Flupster Orator is already using cursors and the fetchmany() method when using chunk().
https://github.com/sdispater/orator/blob/0.9/orator/query/builder.py#L1163
https://github.com/sdispater/orator/blob/0.9/orator/connections/connection.py#L221
So, I am not sure what's going on here.
And unless I missed it we are missing a critical piece of information here: what is the RDBM you are using?
i wish a method get all data by one to one, dont once get all data. and can get result for loop,just like:
select * from table limit 10, 0
select * from table limit 10, 10
select * from table limit 10, 20
...