mongoengine icon indicating copy to clipboard operation
mongoengine copied to clipboard

Switch the default uuidRepresentation from "pythonLegacy" to "unspecified" (not "standard")

Open ShaneHarvey opened this issue 1 year ago • 6 comments

First off, I want to apologize for this whole uuidRepresentation situation. The issue stems from a mistake made in certain MongoDB drivers a decade ago and it could have been addressed a long time ago. I'm sorry that this problem has bubbled over into mongoengine.

I noticed this uuidRepresentation warning which states that mongoengine plans to switch the default uuidRepresentation from "pythonLegacy" to "standard":

    if "uuidrepresentation" not in keys:
        warnings.warn(
            "No uuidRepresentation is specified! Falling back to "
            "'pythonLegacy' which is the default for pymongo 3.x. "
            "For compatibility with other MongoDB drivers this should be "
            "specified as 'standard' or '{java,csharp}Legacy' to work with "
            "older drivers in those languages. This will be changed to "
            "'standard' in a future release.",
            DeprecationWarning,
        )
        kwargs["uuidRepresentation"] = "pythonLegacy"

I'm concerned about this plan and the pain it will cause users. In particular, switching the default from "pythonLegacy" to "standard" will mean that application UUID data will be silently rewritten from subtype 3 to 4 (AKA a mild form of data corruption) and queries may break.

I believe the only safe decision is for mongoengine to either keep using "pythonLegacy" as the default or switch to "unspecified". Switching to "unspecified" avoids data corruption by forcing the user to make a choice if they want to encode UUIDs. They'll need to explicitly add "pythonLegacy" if they want to work with UUIDs already stored with the default behavior, or switch to "standard" if they are storing UUIDs for the first time. Applications also have the ability to migrate from "pythonLegacy" to "standard" if they rewrite all UUID data (or adjust their queries to match either subtype 3 or 4). This is why we switched the default in PyMongo from "pythonLegacy" to "unspecified".

For more background see: https://pymongo.readthedocs.io/en/stable/examples/uuid.html

Please let me know if you have any questions.

ShaneHarvey avatar Jan 20 '23 22:01 ShaneHarvey

Argh :grimacing: indeed this is a tricky situation and for sure switching to 'standard' is not a good idea. On top of uuidRepresentation, we also have this 'json_options' that was introduced and that we probably need to change as well so that its default value is aligned with uuidRepresentation (I guess). I must say that I overlooked this whole thing when it got merged...

Switching to 'unspecified' sounds like the best long-term fix (or eventually not set any value and let Pymongo impose its default). We would have to make sure to write decent instruction in the changelog also.

Would you have time to open a PR @ShaneHarvey ?

bagerard avatar Mar 01 '23 21:03 bagerard

Sure thing, I opened https://github.com/MongoEngine/mongoengine/pull/2739 to address the deprecation warnings. Good call on the json options, I updated those as well.

I suppose this ticket can stay open until you're ready to actually change the default values (some future breaking release).

ShaneHarvey avatar Mar 01 '23 22:03 ShaneHarvey

I'm releasing 0.27.0 right now, feel free to do the change on master when you have the time

bagerard avatar Mar 03 '23 10:03 bagerard

Oh that's sooner than I expected. I'll discuss with my team and get back to you.

ShaneHarvey avatar Mar 03 '23 18:03 ShaneHarvey

No worries, could be in a few weeks/month as well 😄 but on my end I m not too concerned about it. End users are expected to read the changelog when upgrading and the way we plan to change it should be the least error prone and break existing code early

bagerard avatar Mar 03 '23 19:03 bagerard

ping on this topic @ShaneHarvey

bagerard avatar Dec 15 '23 11:12 bagerard