mongoengine
mongoengine copied to clipboard
debug very slow save() command...
hi,
i found that a specific obj.save()
command of my application takes up to MINUTES to be completed.
i replaced that mongoengine command with a pymongo insert_one
expression and it takes no more than a few seconds.
so i enabled debug logging for mongoengine and found out that the time consuming operation is performed AFTER the insert
command has completed.
to be more clear, here is the relevant part of the code:
log.debug('saving new_norm_obj...')
new_norm_obj.save() # may take MINUTES
# part_coll.insert_one(json.loads(new_norm_obj.to_json())) # takes no more than a few seconds
log.debug('saving new_raw_obj...')
and the log:
2022-03-31 12:25:27.254 | DEBUG | data_queue.process:process_source_queues:239 - saving new_norm_obj...
2022-03-31 12:25:28.220 | DEBUG | config.mongo:succeeded:31 - Command createIndexes with request id1424268980 succeeded in 597microseconds
2022-03-31 12:25:28.322 | DEBUG | config.mongo:succeeded:31 - Command insert with request id1911759956 succeeded in 101607microseconds
2022-03-31 12:25:54.191 | DEBUG | data_queue.process:process_source_queues:243 - saving new_raw_obj...
as you can see, in this sample there was a 26 seconds lag after the request id1911759956 succedeed. what was mongonengine doing in this period?
this is the same log portion for the same object when using pymongo instead of mongoengine:
2022-03-31 12:31:53.439 | DEBUG | data_queue.process:process_source_queues:239 - saving new_norm_obj...
2022-03-31 12:31:53.457 | DEBUG | data_queue.process:process_source_queues:244 - saving new_raw_obj...
can you please help me understanding this?
There is an obvious overhead coming from MongoEngine as it deals with validation, pre/post hooks, conversion, etc compared to pymongo. The amount of that overhead usually depends on how complex your Document class is, the level of nesting, the number of EmbeddedDocument etc because the amount of processing for MongoEngine is linear with these things. CachedReferenceFields also have very poor performance so should be used consciously.
Could you provide your Document class? Please also provide some profiling output, in fact we see that the underlying pymongo insert operation is already taking 10 seconds (2022-03-31 12:25:28.322 | DEBUG | config.mongo:succeeded:31 - Command insert with request id1911759956 succeeded in 101607microseconds
) so that must be a big beast that you are trying to save and I'm not necessarily surprised to see a 20 seconds overhead.
What's strange is that it doesn't seem to be reproducible but my guess is that some of your documents are way heavier than others and so it just depends which one you try to save.
as you guessed written documents differ a lot in size. i'm pretty sure the biggest issues arise with documents containing a long (a few k) list of very small embedded documents.
one (of the many) thing that's not clear to me, is what mongonegine should do after the insert... i mean, the heaviest parts of the write process (mostly validation and conversion) should be done before the insert, not after...
Indeed I m also surprised that some time is spent after this Insert but please run some profiling to have additional info
wow... pretty big numbers here!
2022-03-31 23:22:38.484 | DEBUG | data_queue.process:process_source_queues:240 - saving new_norm_obj
510793416 function calls (499629295 primitive calls) in 267.419 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 267.418 267.418 <string>:1(<module>)
12230 0.006 0.000 0.010 0.000 __init__.py:35(unidecode_expect_ascii)
1 0.000 0.000 0.000 0.000 __init__.py:420(_make)
1 0.000 0.000 0.000 0.000 __init__.py:430(_replace)
2 0.000 0.000 0.000 0.000 __init__.py:999(_decode_all_selective)
14612 0.004 0.000 0.009 0.000 abc.py:117(__instancecheck__)
1 0.004 0.004 267.418 267.418 abstract.py:95(save)
6115 0.045 0.000 1.384 0.000 bid.py:119(validate)
6115 0.026 0.000 0.174 0.000 bid.py:139(_get_source_dupes)
54 0.000 0.000 0.000 0.000 bid.py:171(<listcomp>)
8483 0.031 0.000 0.150 0.000 bid.py:35(validate)
2 0.000 0.000 0.000 0.000 binary.py:318(subtype)
1 0.000 0.000 0.000 0.000 client_options.py:262(retry_writes)
2 0.000 0.000 0.000 0.000 client_session.py:1011(get_server_session)
2 0.000 0.000 0.000 0.000 client_session.py:1027(return_server_session)
2 0.000 0.000 0.000 0.000 client_session.py:1034(return_server_session_no_lock)
4 0.000 0.000 0.000 0.000 client_session.py:1041(_clear_stale)
2 0.000 0.000 0.000 0.000 client_session.py:177(__init__)
4 0.000 0.000 0.000 0.000 client_session.py:212(snapshot)
2 0.000 0.000 0.000 0.000 client_session.py:303(_validate_session_write_concern)
2 0.000 0.000 0.000 0.000 client_session.py:352(__init__)
11 0.000 0.000 0.000 0.000 client_session.py:362(active)
3 0.000 0.000 0.000 0.000 client_session.py:381(unpin)
1 0.000 0.000 0.000 0.000 client_session.py:387(reset)
2 0.000 0.000 0.000 0.000 client_session.py:394(__del__)
2 0.000 0.000 0.000 0.000 client_session.py:444(__init__)
2 0.000 0.000 0.000 0.000 client_session.py:456(end_session)
2 0.000 0.000 0.000 0.000 client_session.py:463(_end_session)
3 0.000 0.000 0.000 0.000 client_session.py:475(_check_ended)
2 0.000 0.000 0.000 0.000 client_session.py:492(options)
2 0.000 0.000 0.000 0.000 client_session.py:503(cluster_time)
2 0.000 0.000 0.000 0.000 client_session.py:801(_advance_cluster_time)
2 0.000 0.000 0.000 0.000 client_session.py:824(_advance_operation_time)
2 0.000 0.000 0.000 0.000 client_session.py:845(_process_response)
10 0.000 0.000 0.000 0.000 client_session.py:865(in_transaction)
1 0.000 0.000 0.000 0.000 client_session.py:879(_pinned_address)
2 0.000 0.000 0.000 0.000 client_session.py:895(_unpin)
2 0.000 0.000 0.000 0.000 client_session.py:905(_apply_to)
1 0.000 0.000 0.000 0.000 client_session.py:938(_start_retryable_write)
6 0.000 0.000 0.000 0.000 client_session.py:977(timed_out)
1 0.000 0.000 0.000 0.000 client_session.py:988(inc_transaction_id)
2 0.000 0.000 0.000 0.000 collation.py:216(validate_collation_or_none)
1 0.000 0.000 0.001 0.001 collection.py:1545(__create_indexes)
2 0.000 0.000 0.000 0.000 collection.py:1560(gen_indexes)
1 0.000 0.000 0.001 0.001 collection.py:1585(create_index)
1 0.000 0.000 0.000 0.000 collection.py:187(_socket_for_writes)
1 0.000 0.000 0.001 0.001 collection.py:190(_command)
2 0.000 0.000 0.000 0.000 collection.py:312(name)
1 0.000 0.000 0.000 0.000 collection.py:324(with_options)
1 0.000 0.000 0.112 0.112 collection.py:445(_insert_one)
1 0.000 0.000 0.112 0.112 collection.py:458(_insert_command)
1 0.000 0.000 0.112 0.112 collection.py:480(insert_one)
1 0.000 0.000 0.000 0.000 collection.py:72(__init__)
1 0.000 0.000 0.000 0.000 common.py:493(validate_is_document_type)
75389628 26.726 0.000 34.691 0.000 common.py:5(_import_class)
1 0.000 0.000 0.000 0.000 common.py:797(__init__)
2 0.000 0.000 0.000 0.000 common.py:821(codec_options)
3 0.000 0.000 0.000 0.000 common.py:828(write_concern)
2 0.000 0.000 0.000 0.000 common.py:838(_write_concern_for)
1 0.000 0.000 0.000 0.000 common.py:846(read_preference)
1 0.000 0.000 0.000 0.000 common.py:863(read_concern)
2 0.000 0.000 0.000 0.000 context_managers.py:270(set_write_concern)
7 0.000 0.000 0.000 0.000 contextlib.py:102(__init__)
7/5 0.000 0.000 0.000 0.000 contextlib.py:130(__enter__)
7/5 0.000 0.000 0.000 0.000 contextlib.py:139(__exit__)
7 0.000 0.000 0.000 0.000 contextlib.py:279(helper)
5 0.000 0.000 0.000 0.000 database.py:109(client)
3 0.000 0.000 0.000 0.000 database.py:114(name)
17832 0.043 0.000 0.073 0.000 datastructures.py:112(__init__)
15707225 9.098 0.000 9.098 0.000 datastructures.py:146(__iter__)
17832 0.016 0.000 0.089 0.000 datastructures.py:189(__init__)
752 0.001 0.000 0.007 0.000 datastructures.py:56(get)
752 0.003 0.000 0.006 0.000 datastructures.py:63(__getitem__)
11854 0.042 0.000 0.066 0.000 dereference.py:105(_find_references)
11854 0.010 0.000 0.012 0.000 dereference.py:164(_fetch_objects)
11854 0.053 0.000 0.105 0.000 dereference.py:210(_attach_objects)
11854 0.056 0.000 0.262 0.000 dereference.py:23(__call__)
14598/6115 0.030 0.000 0.777 0.000 document.py:111(to_mongo)
4612173 7.079 0.000 7.122 0.000 document.py:161(__setattr__)
2 0.000 0.000 0.000 0.000 document.py:209(_get_collection)
14600 0.007 0.000 0.009 0.000 document.py:247(__iter__)
1 0.000 0.000 0.000 0.000 document.py:259(__setitem__)
17832 0.007 0.000 0.010 0.000 document.py:273(__len__)
2 0.000 0.000 0.812 0.406 document.py:275(to_mongo)
14599 0.002 0.000 0.002 0.000 document.py:308(clean)
14600/2 0.180 0.000 0.812 0.406 document.py:331(to_mongo)
1 0.000 0.000 267.415 267.415 document.py:333(save)
14600 0.013 0.000 0.013 0.000 document.py:342(<setcomp>)
14599/1 0.071 0.000 1.420 1.420 document.py:387(validate)
14599 0.058 0.000 0.107 0.000 document.py:403(<listcomp>)
1 0.000 0.000 0.112 0.112 document.py:463(_save_create)
1 0.000 0.000 0.000 0.000 document.py:490(__expand_dynamic_values)
1 0.000 0.000 0.000 0.000 document.py:518(_mark_as_changed)
4599189/1 8.292 0.000 265.069 265.069 document.py:548(_clear_changed_fields)
1504/1128 5.066 0.003 264.667 0.235 document.py:586(_nestable_types_clear_changed_fields)
9208439/9196585 27.392 0.000 114.388 0.000 document.py:609(_nestable_types_changed_fields)
6393106 1.277 0.000 1.277 0.000 document.py:633(<listcomp>)
10992295/4599189 118.465 0.000 236.908 0.000 document.py:639(_get_changed_fields)
1 0.000 0.000 0.001 0.001 document.py:873(ensure_indexes)
2899 0.003 0.000 0.003 0.000 document.py:94(__eq__)
1 0.003 0.003 0.004 0.004 fields.py:1040(key_not_string)
1 0.003 0.003 0.005 0.005 fields.py:1049(key_starts_with_dollar)
1 0.000 0.000 1.420 1.420 fields.py:1072(validate)
32945 0.011 0.000 0.014 0.000 fields.py:118(to_python)
81181 0.039 0.000 0.048 0.000 fields.py:123(__get__)
32945 0.018 0.000 0.021 0.000 fields.py:127(validate)
12231 0.028 0.000 0.043 0.000 fields.py:132(__set__)
32945 0.014 0.000 0.028 0.000 fields.py:177(to_mongo)
14598/6115 0.027 0.000 0.806 0.000 fields.py:181(_to_mongo_safe_call)
59774/3 0.054 0.000 1.420 0.473 fields.py:222(_validate)
11854 0.020 0.000 0.289 0.000 fields.py:274(_lazy_load_refs)
25784 0.134 0.000 0.559 0.000 fields.py:285(__get__)
12231/1 0.069 0.000 0.812 0.812 fields.py:394(to_mongo)
12230 0.009 0.000 0.024 0.000 fields.py:417(<dictcomp>)
12231/1 0.016 0.000 0.812 0.812 fields.py:422(<dictcomp>)
12230 0.006 0.000 0.006 0.000 fields.py:458(<listcomp>)
12231/1 0.029 0.000 1.412 1.412 fields.py:463(validate)
1 0.000 0.000 0.000 0.000 fields.py:503(to_python)
58392 0.024 0.000 0.033 0.000 fields.py:751(document_type)
14598/6115 0.022 0.000 0.795 0.000 fields.py:777(to_mongo)
14598/6115 0.024 0.000 1.401 0.000 fields.py:782(validate)
25032 0.059 0.000 0.644 0.000 fields.py:946(__get__)
12230 0.017 0.000 0.240 0.000 fields.py:960(validate)
2 0.000 0.000 0.000 0.000 helpers.py:110(_check_command_response)
1 0.000 0.000 0.000 0.000 helpers.py:191(_get_wce_doc)
1 0.000 0.000 0.000 0.000 helpers.py:203(_check_write_command_response)
1 0.000 0.000 0.000 0.000 helpers.py:60(_gen_index_name)
1 0.000 0.000 0.000 0.000 helpers.py:62(<listcomp>)
1 0.000 0.000 0.000 0.000 helpers.py:65(_index_list)
1 0.000 0.000 0.000 0.000 helpers.py:83(_index_document)
2 0.000 0.000 0.000 0.000 message.py:1273(__init__)
2 0.000 0.000 0.000 0.000 message.py:1287(unpack_response)
2 0.000 0.000 0.000 0.000 message.py:1310(more_to_come)
2 0.000 0.000 0.000 0.000 message.py:1315(unpack)
2 0.000 0.000 0.107 0.054 message.py:575(_op_msg)
1 0.000 0.000 0.000 0.000 mongo_client.py:1021(options)
2 0.000 0.000 0.000 0.000 mongo_client.py:1080(_get_topology)
4 0.000 0.000 0.000 0.000 mongo_client.py:1091(_get_socket)
2 0.000 0.000 0.000 0.000 mongo_client.py:1114(_select_server)
1 0.000 0.000 0.000 0.000 mongo_client.py:1146(_socket_for_writes)
1 0.000 0.000 0.112 0.112 mongo_client.py:1219(_retry_with_session)
1 0.000 0.000 0.112 0.112 mongo_client.py:1231(_retry_internal)
1 0.000 0.000 0.000 0.000 mongo_client.py:1237(is_retrying)
1 0.000 0.000 0.112 0.112 mongo_client.py:1337(_retryable_write)
2 0.000 0.000 0.000 0.000 mongo_client.py:1551(__start_session)
2 0.000 0.000 0.000 0.000 mongo_client.py:1587(_get_server_session)
2 0.000 0.000 0.000 0.000 mongo_client.py:1591(_return_server_session)
2 0.000 0.000 0.000 0.000 mongo_client.py:1595(_ensure_session)
4 0.000 0.000 0.000 0.000 mongo_client.py:1608(_tmp_session)
2 0.000 0.000 0.000 0.000 mongo_client.py:1634(_send_cluster_time)
2 0.000 0.000 0.000 0.000 mongo_client.py:1647(_process_response)
2 0.000 0.000 0.000 0.000 mongo_client.py:1922(__init__)
2 0.000 0.000 0.000 0.000 mongo_client.py:1936(contribute_socket)
2 0.000 0.000 0.000 0.000 mongo_client.py:1943(handle)
2 0.000 0.000 0.000 0.000 mongo_client.py:1963(__enter__)
2 0.000 0.000 0.000 0.000 mongo_client.py:1966(__exit__)
2 0.000 0.000 0.000 0.000 monitor.py:76(open)
2 0.000 0.000 0.000 0.000 monitoring.py:1294(enabled_for_commands)
2 0.000 0.000 0.005 0.002 network.py:188(receive_message)
4 0.000 0.000 0.000 0.000 network.py:229(wait_for_read)
4 0.000 0.000 0.005 0.001 network.py:255(_receive_data_on_socket)
2 0.000 0.000 0.113 0.056 network.py:41(command)
12230 0.020 0.000 0.173 0.000 normalize.py:18(norm_str)
12230 0.008 0.000 0.082 0.000 normalize.py:45(trim_str)
12230 0.005 0.000 0.016 0.000 normalize.py:59(remove_accents)
1 0.000 0.000 0.000 0.000 objectid.py:160(_random)
1 0.000 0.000 0.000 0.000 objectid.py:170(__generate)
1 0.000 0.000 0.000 0.000 objectid.py:212(binary)
1 0.000 0.000 0.000 0.000 objectid.py:264(__ne__)
1 0.000 0.000 0.000 0.000 objectid.py:62(__init__)
1 0.000 0.000 0.000 0.000 operations.py:374(__init__)
1 0.000 0.000 0.000 0.000 operations.py:444(document)
4 0.000 0.000 0.000 0.000 periodic_executor.py:55(open)
4 0.000 0.000 0.000 0.000 pool.py:1068(get)
2 0.000 0.000 0.000 0.000 pool.py:1074(get_overall)
4 0.000 0.000 0.000 0.000 pool.py:1087(stale)
4 0.000 0.000 0.000 0.000 pool.py:1174(closed)
4 0.000 0.000 0.000 0.000 pool.py:1245(stale_generation)
4 0.000 0.000 0.000 0.000 pool.py:1344(get_socket)
4 0.000 0.000 0.000 0.000 pool.py:1402(_raise_if_not_ready)
2 0.000 0.000 0.000 0.000 pool.py:1410(_get_socket)
2 0.000 0.000 0.000 0.000 pool.py:1507(return_socket)
2 0.000 0.000 0.000 0.000 pool.py:1557(_perished)
2 0.000 0.000 0.000 0.000 pool.py:380(max_idle_time_seconds)
2 0.000 0.000 0.000 0.000 pool.py:400(wait_queue_timeout)
4 0.000 0.000 0.000 0.000 pool.py:419(_event_listeners)
2 0.000 0.000 0.000 0.000 pool.py:447(server_api)
2 0.000 0.000 0.000 0.000 pool.py:453(load_balanced)
2 0.000 0.000 0.113 0.056 pool.py:656(command)
2 0.000 0.000 0.000 0.000 pool.py:764(_raise_if_not_writable)
2 0.000 0.000 0.000 0.000 pool.py:802(check_auth)
2 0.000 0.000 0.000 0.000 pool.py:837(validate_session)
1 0.000 0.000 0.000 0.000 pool.py:871(socket_closed)
2 0.000 0.000 0.000 0.000 pool.py:875(send_cluster_time)
2 0.000 0.000 0.000 0.000 pool.py:880(add_server_api)
2 0.000 0.000 0.000 0.000 pool.py:885(update_last_checkin_time)
2 0.000 0.000 0.000 0.000 pool.py:888(update_is_writable)
2 0.000 0.000 0.000 0.000 pool.py:891(idle_time_seconds)
2 0.000 0.000 0.000 0.000 pool.py:928(__hash__)
24460 0.020 0.000 0.126 0.000 re.py:202(sub)
24460 0.048 0.000 0.059 0.000 re.py:288(_compile)
2 0.000 0.000 0.000 0.000 read_preferences.py:122(document)
1 0.000 0.000 0.000 0.000 results.py:25(__init__)
1 0.000 0.000 0.000 0.000 results.py:62(__init__)
1 0.000 0.000 0.000 0.000 results.py:66(inserted_id)
2 0.000 0.000 0.000 0.000 server.py:203(get_socket)
3 0.000 0.000 0.000 0.000 server.py:206(description)
4 0.000 0.000 0.000 0.000 server.py:215(pool)
2 0.000 0.000 0.000 0.000 server.py:44(open)
2 0.000 0.000 0.000 0.000 server_description.py:200(is_server_type_known)
1 0.000 0.000 0.000 0.000 server_description.py:204(retryable_writes_supported)
4 0.000 0.000 0.000 0.000 server_description.py:85(address)
2 0.000 0.000 0.000 0.000 server_description.py:90(server_type)
2 0.000 0.000 0.000 0.000 settings.py:105(server_selection_timeout)
2 0.000 0.000 0.000 0.000 settings.py:109(server_selector)
2 0.000 0.000 0.000 0.000 settings.py:129(load_balanced)
3 0.000 0.000 0.000 0.000 signals.py:40(<lambda>)
1 0.000 0.000 0.000 0.000 socket_checker.py:43(select)
1 0.000 0.000 0.000 0.000 socket_checker.py:82(socket_closed)
14600 0.015 0.000 0.034 0.000 son.py:102(pop)
29207 0.019 0.000 0.032 0.000 son.py:123(update)
53661 0.038 0.000 0.038 0.000 son.py:139(get)
14603 0.022 0.000 0.054 0.000 son.py:39(__init__)
14603 0.129 0.000 0.134 0.000 son.py:45(__new__)
74390 0.067 0.000 0.077 0.000 son.py:56(__setitem__)
29199 0.023 0.000 0.028 0.000 son.py:61(__delitem__)
59798 0.011 0.000 0.011 0.000 son.py:73(__iter__)
4 0.000 0.000 0.000 0.000 threading.py:1095(_wait_for_tstate_lock)
4 0.000 0.000 0.000 0.000 threading.py:1149(is_alive)
6 0.000 0.000 0.000 0.000 threading.py:264(__enter__)
6 0.000 0.000 0.000 0.000 threading.py:267(__exit__)
4 0.000 0.000 0.000 0.000 threading.py:279(_is_owned)
4 0.000 0.000 0.000 0.000 threading.py:359(notify)
4 0.000 0.000 0.000 0.000 threading.py:546(is_set)
2 0.000 0.000 0.000 0.000 topology.py:146(open)
2 0.000 0.000 0.000 0.000 topology.py:175(select_servers)
2 0.000 0.000 0.000 0.000 topology.py:203(<listcomp>)
2 0.000 0.000 0.000 0.000 topology.py:206(_select_servers_loop)
2 0.000 0.000 0.000 0.000 topology.py:237(select_server)
2 0.000 0.000 0.000 0.000 topology.py:368(get_server_by_address)
2 0.000 0.000 0.000 0.000 topology.py:410(max_cluster_time)
2 0.000 0.000 0.000 0.000 topology.py:414(_receive_cluster_time_no_lock)
2 0.000 0.000 0.000 0.000 topology.py:428(receive_cluster_time)
2 0.000 0.000 0.000 0.000 topology.py:502(_check_session_support)
2 0.000 0.000 0.000 0.000 topology.py:525(get_server_session)
2 0.000 0.000 0.000 0.000 topology.py:536(return_server_session)
2 0.000 0.000 0.000 0.000 topology.py:553(_ensure_opened)
2 0.000 0.000 0.000 0.000 topology_description.py:123(check_compatible)
2 0.000 0.000 0.000 0.000 topology_description.py:164(topology_type)
4 0.000 0.000 0.000 0.000 topology_description.py:192(logical_session_timeout_minutes)
2 0.000 0.000 0.000 0.000 topology_description.py:197(known_servers)
2 0.000 0.000 0.000 0.000 topology_description.py:200(<listcomp>)
2 0.000 0.000 0.000 0.000 topology_description.py:241(apply_selector)
8 0.000 0.000 0.000 0.000 write_concern.py:103(acknowledged)
1 0.000 0.000 0.000 0.000 write_concern.py:50(__init__)
3 0.000 0.000 0.000 0.000 write_concern.py:88(is_server_default)
1 0.000 0.000 0.000 0.000 write_concern.py:93(document)
14604 0.004 0.000 0.004 0.000 {built-in method __new__ of type object at 0x7f873b3d0be0}
14612 0.004 0.000 0.004 0.000 {built-in method _abc._abc_instancecheck}
2 0.000 0.000 0.000 0.000 {built-in method _struct.pack}
17832 0.006 0.000 0.006 0.000 {built-in method _weakref.proxy}
2 0.000 0.000 0.000 0.000 {built-in method bson._cbson.decode_all}
1 0.000 0.000 267.419 267.419 {built-in method builtins.exec}
10995312 2.760 0.000 2.780 0.000 {built-in method builtins.getattr}
29496178 7.408 0.000 7.408 0.000 {built-in method builtins.hasattr}
2 0.000 0.000 0.000 0.000 {built-in method builtins.hash}
144918965 27.042 0.000 27.042 0.000 {built-in method builtins.isinstance}
25032 0.004 0.000 0.004 0.000 {built-in method builtins.issubclass}
14604 0.003 0.000 0.003 0.000 {built-in method builtins.iter}
40921 0.007 0.000 0.007 0.000 {built-in method builtins.len}
18/14 0.000 0.000 0.000 0.000 {built-in method builtins.next}
1 0.000 0.000 0.000 0.000 {built-in method builtins.setattr}
12230 0.010 0.000 0.010 0.000 {built-in method builtins.sorted}
7 0.000 0.000 0.000 0.000 {built-in method posix.getpid}
2 0.088 0.044 0.107 0.054 {built-in method pymongo._cmessage._op_msg}
14 0.000 0.000 0.000 0.000 {built-in method time.monotonic}
1 0.000 0.000 0.000 0.000 {built-in method time.time}
752 0.002 0.000 0.002 0.000 {function BaseDict.__getitem__ at 0x7f8738ccaf80}
6 0.000 0.000 0.000 0.000 {method '__enter__' of '_thread.lock' objects}
29 0.000 0.000 0.000 0.000 {method '__exit__' of '_thread.lock' objects}
8 0.000 0.000 0.000 0.000 {method 'acquire' of '_thread.lock' objects}
82519 0.012 0.000 0.012 0.000 {method 'append' of 'list' objects}
4 0.000 0.000 0.000 0.000 {method 'appendleft' of 'collections.deque' objects}
1 0.000 0.000 0.000 0.000 {method 'copy' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {method 'discard' of 'set' objects}
12230 0.005 0.000 0.005 0.000 {method 'encode' of 'str' objects}
196980649 24.825 0.000 24.825 0.000 {method 'get' of 'dict' objects}
2 0.000 0.000 0.000 0.000 {method 'gettimeout' of '_socket.socket' objects}
59805 0.015 0.000 0.015 0.000 {method 'items' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'poll' of 'select.poll' objects}
2 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects}
4 0.000 0.000 0.000 0.000 {method 'popleft' of 'collections.deque' objects}
4 0.004 0.001 0.004 0.001 {method 'recv_into' of '_socket.socket' objects}
1 0.000 0.000 0.000 0.000 {method 'register' of 'select.poll' objects}
29199 0.006 0.000 0.006 0.000 {method 'remove' of 'list' objects}
2 0.001 0.000 0.001 0.000 {method 'sendall' of '_socket.socket' objects}
9988 0.005 0.000 0.005 0.000 {method 'split' of 'str' objects}
6120 0.001 0.000 0.001 0.000 {method 'startswith' of 'str' objects}
24460 0.047 0.000 0.047 0.000 {method 'sub' of 're.Pattern' objects}
2 0.000 0.000 0.000 0.000 {method 'unpack' of '_struct.Struct' objects}
2 0.000 0.000 0.000 0.000 {method 'unpack_from' of '_struct.Struct' objects}
1 0.000 0.000 0.000 0.000 {method 'unregister' of 'select.poll' objects}
1 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
12230 0.003 0.000 0.003 0.000 {method 'upper' of 'str' objects}
4 0.000 0.000 0.000 0.000 {method 'values' of 'dict' objects}
2022-03-31 23:27:05.915 | DEBUG | data_queue.process:process_source_queues:246 - saving new_raw_obj...
i made a few more tests over the same document (unfortunately a different one from the example i used in the previous message) using different save()
options:
new_norm_obj.save() #-> avg ~2m
new_norm_obj.save(validate=False, clean=False) #-> avg ~2m17s
new_norm_obj.save(validate=False, clean=False, force_insert=True) #-> avg ~1m55s
part_coll.insert_one(json.loads(new_norm_obj.to_json())) #-> avg ~2.5s
The culprits are here
4599189/1 8.292 0.000 265.069 265.069 document.py:548(_clear_changed_fields) 1504/1128 5.066 0.003 264.667 0.235 document.py:586(_nestable_types_clear_changed_fields) 9208439/9196585 27.392 0.000 114.388 0.000 document.py:609(_nestable_types_changed_fields) ... 10992295/4599189 118.465 0.000 236.908 0.000 document.py:639(_get_changed_fields)
Basically all Document/EmbeddedDocuments instances are tracking changes independently (so that we you .save(), it only saves fields that were modified), whenever you .save() the document, it will flush the changes and reset all changed_fields that were tracked. The numbers are pretty big here, like 10M calls, but it's hard to provide further assistance without having your Document class + an idea of how many documents /embedded_documents exists in that single record
Thank you for your help, I finally decided to go for pymongo raw queries.