mongoengine icon indicating copy to clipboard operation
mongoengine copied to clipboard

debug very slow save() command...

Open masavini opened this issue 2 years ago • 6 comments

hi, i found that a specific obj.save() command of my application takes up to MINUTES to be completed. i replaced that mongoengine command with a pymongo insert_one expression and it takes no more than a few seconds.

so i enabled debug logging for mongoengine and found out that the time consuming operation is performed AFTER the insert command has completed.

to be more clear, here is the relevant part of the code:

log.debug('saving new_norm_obj...')
new_norm_obj.save() # may take MINUTES
# part_coll.insert_one(json.loads(new_norm_obj.to_json())) # takes no more than a few seconds
log.debug('saving new_raw_obj...')

and the log:

2022-03-31 12:25:27.254 | DEBUG    | data_queue.process:process_source_queues:239 - saving new_norm_obj...
2022-03-31 12:25:28.220 | DEBUG    | config.mongo:succeeded:31 - Command createIndexes with request id1424268980 succeeded in 597microseconds
2022-03-31 12:25:28.322 | DEBUG    | config.mongo:succeeded:31 - Command insert with request id1911759956 succeeded in 101607microseconds
2022-03-31 12:25:54.191 | DEBUG    | data_queue.process:process_source_queues:243 - saving new_raw_obj...

as you can see, in this sample there was a 26 seconds lag after the request id1911759956 succedeed. what was mongonengine doing in this period?

this is the same log portion for the same object when using pymongo instead of mongoengine:

2022-03-31 12:31:53.439 | DEBUG    | data_queue.process:process_source_queues:239 - saving new_norm_obj...
2022-03-31 12:31:53.457 | DEBUG    | data_queue.process:process_source_queues:244 - saving new_raw_obj...

can you please help me understanding this?

masavini avatar Mar 31 '22 10:03 masavini

There is an obvious overhead coming from MongoEngine as it deals with validation, pre/post hooks, conversion, etc compared to pymongo. The amount of that overhead usually depends on how complex your Document class is, the level of nesting, the number of EmbeddedDocument etc because the amount of processing for MongoEngine is linear with these things. CachedReferenceFields also have very poor performance so should be used consciously.

Could you provide your Document class? Please also provide some profiling output, in fact we see that the underlying pymongo insert operation is already taking 10 seconds (2022-03-31 12:25:28.322 | DEBUG | config.mongo:succeeded:31 - Command insert with request id1911759956 succeeded in 101607microseconds) so that must be a big beast that you are trying to save and I'm not necessarily surprised to see a 20 seconds overhead.

What's strange is that it doesn't seem to be reproducible but my guess is that some of your documents are way heavier than others and so it just depends which one you try to save.

bagerard avatar Mar 31 '22 11:03 bagerard

as you guessed written documents differ a lot in size. i'm pretty sure the biggest issues arise with documents containing a long (a few k) list of very small embedded documents.

one (of the many) thing that's not clear to me, is what mongonegine should do after the insert... i mean, the heaviest parts of the write process (mostly validation and conversion) should be done before the insert, not after...

masavini avatar Mar 31 '22 13:03 masavini

Indeed I m also surprised that some time is spent after this Insert but please run some profiling to have additional info

bagerard avatar Mar 31 '22 19:03 bagerard

wow... pretty big numbers here!

2022-03-31 23:22:38.484 | DEBUG    | data_queue.process:process_source_queues:240 - saving new_norm_obj
         510793416 function calls (499629295 primitive calls) in 267.419 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  267.418  267.418 <string>:1(<module>)
    12230    0.006    0.000    0.010    0.000 __init__.py:35(unidecode_expect_ascii)
        1    0.000    0.000    0.000    0.000 __init__.py:420(_make)
        1    0.000    0.000    0.000    0.000 __init__.py:430(_replace)
        2    0.000    0.000    0.000    0.000 __init__.py:999(_decode_all_selective)
    14612    0.004    0.000    0.009    0.000 abc.py:117(__instancecheck__)
        1    0.004    0.004  267.418  267.418 abstract.py:95(save)
     6115    0.045    0.000    1.384    0.000 bid.py:119(validate)
     6115    0.026    0.000    0.174    0.000 bid.py:139(_get_source_dupes)
       54    0.000    0.000    0.000    0.000 bid.py:171(<listcomp>)
     8483    0.031    0.000    0.150    0.000 bid.py:35(validate)
        2    0.000    0.000    0.000    0.000 binary.py:318(subtype)
        1    0.000    0.000    0.000    0.000 client_options.py:262(retry_writes)
        2    0.000    0.000    0.000    0.000 client_session.py:1011(get_server_session)
        2    0.000    0.000    0.000    0.000 client_session.py:1027(return_server_session)
        2    0.000    0.000    0.000    0.000 client_session.py:1034(return_server_session_no_lock)
        4    0.000    0.000    0.000    0.000 client_session.py:1041(_clear_stale)
        2    0.000    0.000    0.000    0.000 client_session.py:177(__init__)
        4    0.000    0.000    0.000    0.000 client_session.py:212(snapshot)
        2    0.000    0.000    0.000    0.000 client_session.py:303(_validate_session_write_concern)
        2    0.000    0.000    0.000    0.000 client_session.py:352(__init__)
       11    0.000    0.000    0.000    0.000 client_session.py:362(active)
        3    0.000    0.000    0.000    0.000 client_session.py:381(unpin)
        1    0.000    0.000    0.000    0.000 client_session.py:387(reset)
        2    0.000    0.000    0.000    0.000 client_session.py:394(__del__)
        2    0.000    0.000    0.000    0.000 client_session.py:444(__init__)
        2    0.000    0.000    0.000    0.000 client_session.py:456(end_session)
        2    0.000    0.000    0.000    0.000 client_session.py:463(_end_session)
        3    0.000    0.000    0.000    0.000 client_session.py:475(_check_ended)
        2    0.000    0.000    0.000    0.000 client_session.py:492(options)
        2    0.000    0.000    0.000    0.000 client_session.py:503(cluster_time)
        2    0.000    0.000    0.000    0.000 client_session.py:801(_advance_cluster_time)
        2    0.000    0.000    0.000    0.000 client_session.py:824(_advance_operation_time)
        2    0.000    0.000    0.000    0.000 client_session.py:845(_process_response)
       10    0.000    0.000    0.000    0.000 client_session.py:865(in_transaction)
        1    0.000    0.000    0.000    0.000 client_session.py:879(_pinned_address)
        2    0.000    0.000    0.000    0.000 client_session.py:895(_unpin)
        2    0.000    0.000    0.000    0.000 client_session.py:905(_apply_to)
        1    0.000    0.000    0.000    0.000 client_session.py:938(_start_retryable_write)
        6    0.000    0.000    0.000    0.000 client_session.py:977(timed_out)
        1    0.000    0.000    0.000    0.000 client_session.py:988(inc_transaction_id)
        2    0.000    0.000    0.000    0.000 collation.py:216(validate_collation_or_none)
        1    0.000    0.000    0.001    0.001 collection.py:1545(__create_indexes)
        2    0.000    0.000    0.000    0.000 collection.py:1560(gen_indexes)
        1    0.000    0.000    0.001    0.001 collection.py:1585(create_index)
        1    0.000    0.000    0.000    0.000 collection.py:187(_socket_for_writes)
        1    0.000    0.000    0.001    0.001 collection.py:190(_command)
        2    0.000    0.000    0.000    0.000 collection.py:312(name)
        1    0.000    0.000    0.000    0.000 collection.py:324(with_options)
        1    0.000    0.000    0.112    0.112 collection.py:445(_insert_one)
        1    0.000    0.000    0.112    0.112 collection.py:458(_insert_command)
        1    0.000    0.000    0.112    0.112 collection.py:480(insert_one)
        1    0.000    0.000    0.000    0.000 collection.py:72(__init__)
        1    0.000    0.000    0.000    0.000 common.py:493(validate_is_document_type)
 75389628   26.726    0.000   34.691    0.000 common.py:5(_import_class)
        1    0.000    0.000    0.000    0.000 common.py:797(__init__)
        2    0.000    0.000    0.000    0.000 common.py:821(codec_options)
        3    0.000    0.000    0.000    0.000 common.py:828(write_concern)
        2    0.000    0.000    0.000    0.000 common.py:838(_write_concern_for)
        1    0.000    0.000    0.000    0.000 common.py:846(read_preference)
        1    0.000    0.000    0.000    0.000 common.py:863(read_concern)
        2    0.000    0.000    0.000    0.000 context_managers.py:270(set_write_concern)
        7    0.000    0.000    0.000    0.000 contextlib.py:102(__init__)
      7/5    0.000    0.000    0.000    0.000 contextlib.py:130(__enter__)
      7/5    0.000    0.000    0.000    0.000 contextlib.py:139(__exit__)
        7    0.000    0.000    0.000    0.000 contextlib.py:279(helper)
        5    0.000    0.000    0.000    0.000 database.py:109(client)
        3    0.000    0.000    0.000    0.000 database.py:114(name)
    17832    0.043    0.000    0.073    0.000 datastructures.py:112(__init__)
 15707225    9.098    0.000    9.098    0.000 datastructures.py:146(__iter__)
    17832    0.016    0.000    0.089    0.000 datastructures.py:189(__init__)
      752    0.001    0.000    0.007    0.000 datastructures.py:56(get)
      752    0.003    0.000    0.006    0.000 datastructures.py:63(__getitem__)
    11854    0.042    0.000    0.066    0.000 dereference.py:105(_find_references)
    11854    0.010    0.000    0.012    0.000 dereference.py:164(_fetch_objects)
    11854    0.053    0.000    0.105    0.000 dereference.py:210(_attach_objects)
    11854    0.056    0.000    0.262    0.000 dereference.py:23(__call__)
14598/6115    0.030    0.000    0.777    0.000 document.py:111(to_mongo)
  4612173    7.079    0.000    7.122    0.000 document.py:161(__setattr__)
        2    0.000    0.000    0.000    0.000 document.py:209(_get_collection)
    14600    0.007    0.000    0.009    0.000 document.py:247(__iter__)
        1    0.000    0.000    0.000    0.000 document.py:259(__setitem__)
    17832    0.007    0.000    0.010    0.000 document.py:273(__len__)
        2    0.000    0.000    0.812    0.406 document.py:275(to_mongo)
    14599    0.002    0.000    0.002    0.000 document.py:308(clean)
  14600/2    0.180    0.000    0.812    0.406 document.py:331(to_mongo)
        1    0.000    0.000  267.415  267.415 document.py:333(save)
    14600    0.013    0.000    0.013    0.000 document.py:342(<setcomp>)
  14599/1    0.071    0.000    1.420    1.420 document.py:387(validate)
    14599    0.058    0.000    0.107    0.000 document.py:403(<listcomp>)
        1    0.000    0.000    0.112    0.112 document.py:463(_save_create)
        1    0.000    0.000    0.000    0.000 document.py:490(__expand_dynamic_values)
        1    0.000    0.000    0.000    0.000 document.py:518(_mark_as_changed)
4599189/1    8.292    0.000  265.069  265.069 document.py:548(_clear_changed_fields)
1504/1128    5.066    0.003  264.667    0.235 document.py:586(_nestable_types_clear_changed_fields)
9208439/9196585   27.392    0.000  114.388    0.000 document.py:609(_nestable_types_changed_fields)
  6393106    1.277    0.000    1.277    0.000 document.py:633(<listcomp>)
10992295/4599189  118.465    0.000  236.908    0.000 document.py:639(_get_changed_fields)
        1    0.000    0.000    0.001    0.001 document.py:873(ensure_indexes)
     2899    0.003    0.000    0.003    0.000 document.py:94(__eq__)
        1    0.003    0.003    0.004    0.004 fields.py:1040(key_not_string)
        1    0.003    0.003    0.005    0.005 fields.py:1049(key_starts_with_dollar)
        1    0.000    0.000    1.420    1.420 fields.py:1072(validate)
    32945    0.011    0.000    0.014    0.000 fields.py:118(to_python)
    81181    0.039    0.000    0.048    0.000 fields.py:123(__get__)
    32945    0.018    0.000    0.021    0.000 fields.py:127(validate)
    12231    0.028    0.000    0.043    0.000 fields.py:132(__set__)
    32945    0.014    0.000    0.028    0.000 fields.py:177(to_mongo)
14598/6115    0.027    0.000    0.806    0.000 fields.py:181(_to_mongo_safe_call)
  59774/3    0.054    0.000    1.420    0.473 fields.py:222(_validate)
    11854    0.020    0.000    0.289    0.000 fields.py:274(_lazy_load_refs)
    25784    0.134    0.000    0.559    0.000 fields.py:285(__get__)
  12231/1    0.069    0.000    0.812    0.812 fields.py:394(to_mongo)
    12230    0.009    0.000    0.024    0.000 fields.py:417(<dictcomp>)
  12231/1    0.016    0.000    0.812    0.812 fields.py:422(<dictcomp>)
    12230    0.006    0.000    0.006    0.000 fields.py:458(<listcomp>)
  12231/1    0.029    0.000    1.412    1.412 fields.py:463(validate)
        1    0.000    0.000    0.000    0.000 fields.py:503(to_python)
    58392    0.024    0.000    0.033    0.000 fields.py:751(document_type)
14598/6115    0.022    0.000    0.795    0.000 fields.py:777(to_mongo)
14598/6115    0.024    0.000    1.401    0.000 fields.py:782(validate)
    25032    0.059    0.000    0.644    0.000 fields.py:946(__get__)
    12230    0.017    0.000    0.240    0.000 fields.py:960(validate)
        2    0.000    0.000    0.000    0.000 helpers.py:110(_check_command_response)
        1    0.000    0.000    0.000    0.000 helpers.py:191(_get_wce_doc)
        1    0.000    0.000    0.000    0.000 helpers.py:203(_check_write_command_response)
        1    0.000    0.000    0.000    0.000 helpers.py:60(_gen_index_name)
        1    0.000    0.000    0.000    0.000 helpers.py:62(<listcomp>)
        1    0.000    0.000    0.000    0.000 helpers.py:65(_index_list)
        1    0.000    0.000    0.000    0.000 helpers.py:83(_index_document)
        2    0.000    0.000    0.000    0.000 message.py:1273(__init__)
        2    0.000    0.000    0.000    0.000 message.py:1287(unpack_response)
        2    0.000    0.000    0.000    0.000 message.py:1310(more_to_come)
        2    0.000    0.000    0.000    0.000 message.py:1315(unpack)
        2    0.000    0.000    0.107    0.054 message.py:575(_op_msg)
        1    0.000    0.000    0.000    0.000 mongo_client.py:1021(options)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1080(_get_topology)
        4    0.000    0.000    0.000    0.000 mongo_client.py:1091(_get_socket)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1114(_select_server)
        1    0.000    0.000    0.000    0.000 mongo_client.py:1146(_socket_for_writes)
        1    0.000    0.000    0.112    0.112 mongo_client.py:1219(_retry_with_session)
        1    0.000    0.000    0.112    0.112 mongo_client.py:1231(_retry_internal)
        1    0.000    0.000    0.000    0.000 mongo_client.py:1237(is_retrying)
        1    0.000    0.000    0.112    0.112 mongo_client.py:1337(_retryable_write)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1551(__start_session)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1587(_get_server_session)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1591(_return_server_session)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1595(_ensure_session)
        4    0.000    0.000    0.000    0.000 mongo_client.py:1608(_tmp_session)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1634(_send_cluster_time)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1647(_process_response)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1922(__init__)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1936(contribute_socket)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1943(handle)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1963(__enter__)
        2    0.000    0.000    0.000    0.000 mongo_client.py:1966(__exit__)
        2    0.000    0.000    0.000    0.000 monitor.py:76(open)
        2    0.000    0.000    0.000    0.000 monitoring.py:1294(enabled_for_commands)
        2    0.000    0.000    0.005    0.002 network.py:188(receive_message)
        4    0.000    0.000    0.000    0.000 network.py:229(wait_for_read)
        4    0.000    0.000    0.005    0.001 network.py:255(_receive_data_on_socket)
        2    0.000    0.000    0.113    0.056 network.py:41(command)
    12230    0.020    0.000    0.173    0.000 normalize.py:18(norm_str)
    12230    0.008    0.000    0.082    0.000 normalize.py:45(trim_str)
    12230    0.005    0.000    0.016    0.000 normalize.py:59(remove_accents)
        1    0.000    0.000    0.000    0.000 objectid.py:160(_random)
        1    0.000    0.000    0.000    0.000 objectid.py:170(__generate)
        1    0.000    0.000    0.000    0.000 objectid.py:212(binary)
        1    0.000    0.000    0.000    0.000 objectid.py:264(__ne__)
        1    0.000    0.000    0.000    0.000 objectid.py:62(__init__)
        1    0.000    0.000    0.000    0.000 operations.py:374(__init__)
        1    0.000    0.000    0.000    0.000 operations.py:444(document)
        4    0.000    0.000    0.000    0.000 periodic_executor.py:55(open)
        4    0.000    0.000    0.000    0.000 pool.py:1068(get)
        2    0.000    0.000    0.000    0.000 pool.py:1074(get_overall)
        4    0.000    0.000    0.000    0.000 pool.py:1087(stale)
        4    0.000    0.000    0.000    0.000 pool.py:1174(closed)
        4    0.000    0.000    0.000    0.000 pool.py:1245(stale_generation)
        4    0.000    0.000    0.000    0.000 pool.py:1344(get_socket)
        4    0.000    0.000    0.000    0.000 pool.py:1402(_raise_if_not_ready)
        2    0.000    0.000    0.000    0.000 pool.py:1410(_get_socket)
        2    0.000    0.000    0.000    0.000 pool.py:1507(return_socket)
        2    0.000    0.000    0.000    0.000 pool.py:1557(_perished)
        2    0.000    0.000    0.000    0.000 pool.py:380(max_idle_time_seconds)
        2    0.000    0.000    0.000    0.000 pool.py:400(wait_queue_timeout)
        4    0.000    0.000    0.000    0.000 pool.py:419(_event_listeners)
        2    0.000    0.000    0.000    0.000 pool.py:447(server_api)
        2    0.000    0.000    0.000    0.000 pool.py:453(load_balanced)
        2    0.000    0.000    0.113    0.056 pool.py:656(command)
        2    0.000    0.000    0.000    0.000 pool.py:764(_raise_if_not_writable)
        2    0.000    0.000    0.000    0.000 pool.py:802(check_auth)
        2    0.000    0.000    0.000    0.000 pool.py:837(validate_session)
        1    0.000    0.000    0.000    0.000 pool.py:871(socket_closed)
        2    0.000    0.000    0.000    0.000 pool.py:875(send_cluster_time)
        2    0.000    0.000    0.000    0.000 pool.py:880(add_server_api)
        2    0.000    0.000    0.000    0.000 pool.py:885(update_last_checkin_time)
        2    0.000    0.000    0.000    0.000 pool.py:888(update_is_writable)
        2    0.000    0.000    0.000    0.000 pool.py:891(idle_time_seconds)
        2    0.000    0.000    0.000    0.000 pool.py:928(__hash__)
    24460    0.020    0.000    0.126    0.000 re.py:202(sub)
    24460    0.048    0.000    0.059    0.000 re.py:288(_compile)
        2    0.000    0.000    0.000    0.000 read_preferences.py:122(document)
        1    0.000    0.000    0.000    0.000 results.py:25(__init__)
        1    0.000    0.000    0.000    0.000 results.py:62(__init__)
        1    0.000    0.000    0.000    0.000 results.py:66(inserted_id)
        2    0.000    0.000    0.000    0.000 server.py:203(get_socket)
        3    0.000    0.000    0.000    0.000 server.py:206(description)
        4    0.000    0.000    0.000    0.000 server.py:215(pool)
        2    0.000    0.000    0.000    0.000 server.py:44(open)
        2    0.000    0.000    0.000    0.000 server_description.py:200(is_server_type_known)
        1    0.000    0.000    0.000    0.000 server_description.py:204(retryable_writes_supported)
        4    0.000    0.000    0.000    0.000 server_description.py:85(address)
        2    0.000    0.000    0.000    0.000 server_description.py:90(server_type)
        2    0.000    0.000    0.000    0.000 settings.py:105(server_selection_timeout)
        2    0.000    0.000    0.000    0.000 settings.py:109(server_selector)
        2    0.000    0.000    0.000    0.000 settings.py:129(load_balanced)
        3    0.000    0.000    0.000    0.000 signals.py:40(<lambda>)
        1    0.000    0.000    0.000    0.000 socket_checker.py:43(select)
        1    0.000    0.000    0.000    0.000 socket_checker.py:82(socket_closed)
    14600    0.015    0.000    0.034    0.000 son.py:102(pop)
    29207    0.019    0.000    0.032    0.000 son.py:123(update)
    53661    0.038    0.000    0.038    0.000 son.py:139(get)
    14603    0.022    0.000    0.054    0.000 son.py:39(__init__)
    14603    0.129    0.000    0.134    0.000 son.py:45(__new__)
    74390    0.067    0.000    0.077    0.000 son.py:56(__setitem__)
    29199    0.023    0.000    0.028    0.000 son.py:61(__delitem__)
    59798    0.011    0.000    0.011    0.000 son.py:73(__iter__)
        4    0.000    0.000    0.000    0.000 threading.py:1095(_wait_for_tstate_lock)
        4    0.000    0.000    0.000    0.000 threading.py:1149(is_alive)
        6    0.000    0.000    0.000    0.000 threading.py:264(__enter__)
        6    0.000    0.000    0.000    0.000 threading.py:267(__exit__)
        4    0.000    0.000    0.000    0.000 threading.py:279(_is_owned)
        4    0.000    0.000    0.000    0.000 threading.py:359(notify)
        4    0.000    0.000    0.000    0.000 threading.py:546(is_set)
        2    0.000    0.000    0.000    0.000 topology.py:146(open)
        2    0.000    0.000    0.000    0.000 topology.py:175(select_servers)
        2    0.000    0.000    0.000    0.000 topology.py:203(<listcomp>)
        2    0.000    0.000    0.000    0.000 topology.py:206(_select_servers_loop)
        2    0.000    0.000    0.000    0.000 topology.py:237(select_server)
        2    0.000    0.000    0.000    0.000 topology.py:368(get_server_by_address)
        2    0.000    0.000    0.000    0.000 topology.py:410(max_cluster_time)
        2    0.000    0.000    0.000    0.000 topology.py:414(_receive_cluster_time_no_lock)
        2    0.000    0.000    0.000    0.000 topology.py:428(receive_cluster_time)
        2    0.000    0.000    0.000    0.000 topology.py:502(_check_session_support)
        2    0.000    0.000    0.000    0.000 topology.py:525(get_server_session)
        2    0.000    0.000    0.000    0.000 topology.py:536(return_server_session)
        2    0.000    0.000    0.000    0.000 topology.py:553(_ensure_opened)
        2    0.000    0.000    0.000    0.000 topology_description.py:123(check_compatible)
        2    0.000    0.000    0.000    0.000 topology_description.py:164(topology_type)
        4    0.000    0.000    0.000    0.000 topology_description.py:192(logical_session_timeout_minutes)
        2    0.000    0.000    0.000    0.000 topology_description.py:197(known_servers)
        2    0.000    0.000    0.000    0.000 topology_description.py:200(<listcomp>)
        2    0.000    0.000    0.000    0.000 topology_description.py:241(apply_selector)
        8    0.000    0.000    0.000    0.000 write_concern.py:103(acknowledged)
        1    0.000    0.000    0.000    0.000 write_concern.py:50(__init__)
        3    0.000    0.000    0.000    0.000 write_concern.py:88(is_server_default)
        1    0.000    0.000    0.000    0.000 write_concern.py:93(document)
    14604    0.004    0.000    0.004    0.000 {built-in method __new__ of type object at 0x7f873b3d0be0}
    14612    0.004    0.000    0.004    0.000 {built-in method _abc._abc_instancecheck}
        2    0.000    0.000    0.000    0.000 {built-in method _struct.pack}
    17832    0.006    0.000    0.006    0.000 {built-in method _weakref.proxy}
        2    0.000    0.000    0.000    0.000 {built-in method bson._cbson.decode_all}
        1    0.000    0.000  267.419  267.419 {built-in method builtins.exec}
 10995312    2.760    0.000    2.780    0.000 {built-in method builtins.getattr}
 29496178    7.408    0.000    7.408    0.000 {built-in method builtins.hasattr}
        2    0.000    0.000    0.000    0.000 {built-in method builtins.hash}
144918965   27.042    0.000   27.042    0.000 {built-in method builtins.isinstance}
    25032    0.004    0.000    0.004    0.000 {built-in method builtins.issubclass}
    14604    0.003    0.000    0.003    0.000 {built-in method builtins.iter}
    40921    0.007    0.000    0.007    0.000 {built-in method builtins.len}
    18/14    0.000    0.000    0.000    0.000 {built-in method builtins.next}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.setattr}
    12230    0.010    0.000    0.010    0.000 {built-in method builtins.sorted}
        7    0.000    0.000    0.000    0.000 {built-in method posix.getpid}
        2    0.088    0.044    0.107    0.054 {built-in method pymongo._cmessage._op_msg}
       14    0.000    0.000    0.000    0.000 {built-in method time.monotonic}
        1    0.000    0.000    0.000    0.000 {built-in method time.time}
      752    0.002    0.000    0.002    0.000 {function BaseDict.__getitem__ at 0x7f8738ccaf80}
        6    0.000    0.000    0.000    0.000 {method '__enter__' of '_thread.lock' objects}
       29    0.000    0.000    0.000    0.000 {method '__exit__' of '_thread.lock' objects}
        8    0.000    0.000    0.000    0.000 {method 'acquire' of '_thread.lock' objects}
    82519    0.012    0.000    0.012    0.000 {method 'append' of 'list' objects}
        4    0.000    0.000    0.000    0.000 {method 'appendleft' of 'collections.deque' objects}
        1    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'discard' of 'set' objects}
    12230    0.005    0.000    0.005    0.000 {method 'encode' of 'str' objects}
196980649   24.825    0.000   24.825    0.000 {method 'get' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 {method 'gettimeout' of '_socket.socket' objects}
    59805    0.015    0.000    0.015    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'poll' of 'select.poll' objects}
        2    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
        4    0.000    0.000    0.000    0.000 {method 'popleft' of 'collections.deque' objects}
        4    0.004    0.001    0.004    0.001 {method 'recv_into' of '_socket.socket' objects}
        1    0.000    0.000    0.000    0.000 {method 'register' of 'select.poll' objects}
    29199    0.006    0.000    0.006    0.000 {method 'remove' of 'list' objects}
        2    0.001    0.000    0.001    0.000 {method 'sendall' of '_socket.socket' objects}
     9988    0.005    0.000    0.005    0.000 {method 'split' of 'str' objects}
     6120    0.001    0.000    0.001    0.000 {method 'startswith' of 'str' objects}
    24460    0.047    0.000    0.047    0.000 {method 'sub' of 're.Pattern' objects}
        2    0.000    0.000    0.000    0.000 {method 'unpack' of '_struct.Struct' objects}
        2    0.000    0.000    0.000    0.000 {method 'unpack_from' of '_struct.Struct' objects}
        1    0.000    0.000    0.000    0.000 {method 'unregister' of 'select.poll' objects}
        1    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
    12230    0.003    0.000    0.003    0.000 {method 'upper' of 'str' objects}
        4    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}


2022-03-31 23:27:05.915 | DEBUG    | data_queue.process:process_source_queues:246 - saving new_raw_obj...

masavini avatar Mar 31 '22 21:03 masavini

i made a few more tests over the same document (unfortunately a different one from the example i used in the previous message) using different save() options:

new_norm_obj.save()  #-> avg ~2m
new_norm_obj.save(validate=False, clean=False)  #-> avg ~2m17s
new_norm_obj.save(validate=False, clean=False, force_insert=True)  #-> avg ~1m55s
part_coll.insert_one(json.loads(new_norm_obj.to_json()))  #-> avg ~2.5s

masavini avatar Mar 31 '22 23:03 masavini

The culprits are here 4599189/1 8.292 0.000 265.069 265.069 document.py:548(_clear_changed_fields) 1504/1128 5.066 0.003 264.667 0.235 document.py:586(_nestable_types_clear_changed_fields) 9208439/9196585 27.392 0.000 114.388 0.000 document.py:609(_nestable_types_changed_fields) ... 10992295/4599189 118.465 0.000 236.908 0.000 document.py:639(_get_changed_fields)

Basically all Document/EmbeddedDocuments instances are tracking changes independently (so that we you .save(), it only saves fields that were modified), whenever you .save() the document, it will flush the changes and reset all changed_fields that were tracked. The numbers are pretty big here, like 10M calls, but it's hard to provide further assistance without having your Document class + an idea of how many documents /embedded_documents exists in that single record

bagerard avatar Apr 14 '22 08:04 bagerard

Thank you for your help, I finally decided to go for pymongo raw queries.

masavini avatar Dec 31 '22 02:12 masavini