codex Thank you (again) for GPT 5.2!

It was only a few weeks ago when I was grateful for Codex 5.1 Extra high, but I'm ever more grateful for the new 5.2 Extra High! After taking it for a spin, this is immediately and noticeably far more superior to Codex 5.1 (or Gemini 3 Pro, Opus 4.5 and so on).

So far, nothing seems to be as good as GPT 5.2 Extra High. Gemini 3 Pro makes bold promises on paper but falls short (even before 5.2 came out - GPT 5.1 reviews and analysis were far more precise compared to Gemini 3 Pro). GPT 5.2 ExHigh takes its sweet time (something I was hoping Codex 5.1 ExHigh would do but did not as much), is extremely thorough, extremely precise and so far spot-on!

I've tested this on some complex "See this bug report, here are some logs, here are two million lines of code" type of scenarios and 5.2 outperforms every other model.

Perhaps the most surprising new feature is the new Aug 31 2025 knowledge cut-off date! That is brilliant because this covers nearly all of the latest APIs / SDKs released this year.

Thank you once again for the fantastic work you all are doing! Absolutely love Codex CLI.

Dec 12 '25 21:12 guidedways

Yeah, OpenAI needed this kind of boost / breakthrough. It's good they made it!

Dec 12 '25 22:12 gertsio

I have to support this! gpt-5.2 xhigh is much! better than codex 5.1. it thinks things through again, takes it's time but that's way better than 5.1!

Dec 13 '25 10:12 styler-ai

The long-context improvements are remarkable. This honestly does not feel like a minor .2 increment to v5! Elsewhere OpenAI has admitted to being bad at naming, this can now be extended to being bad at versioning!

With 5.1 any time I'd see 70% context left the steep drop-off in regards to overall accuracy was extremely noticeable. I'd have to /new around 50% because beyond that the model would either hallucinate or become extremely rigid in its approach.

With 5.2 - it goes all the way down to 30% and STILL produces 110% sensible code and analytics! I'm having an absolute blast with this beast of a model.

Honestly, unbelievable what this model is suddenly capable of.

Dec 13 '25 20:12 guidedways

I have been using GPT 5.2 since the day it released. Have to say - it produces the most robust enterprise grade code I have ever seen. It beats Codex 5.1 or Codex 5.1 Max xHigh (any 5.1 model) by a FAR FAR Margin.

I'm working on actual enterprise code, it took it's own sweet time (often 1-2 hours, sometimes 4-5+ hours), shocked by seeing how it prioritized correctness over speed and time. It is thorough and through!

Beast of a model.

Can't wait for OpenAI to release the Codex version of 5.2, specifically optimized for writing production grade

Dec 15 '25 09:12 ReachDelta4

@ReachDelta4 : Great Info ! How did you measure ‘better’ here — fewer bugs, less rework, better tests, faster merges — and did you try the same evaluation with Codex 5.1 Max xHigh (gpt-5.1-codex-max (reasoning xhigh, summaries auto))?

Dec 15 '25 10:12 DKWien

@DKWien GPT 5.2 has one-shotted many errors that 5.1 codex max xhigh couldn't solve for days. And 5.2's ability to handle large context is almost uncanny.

I’ve used 5.1 Max xHigh extensively, even until the day 5.2 dropped.

My metric here is Architectural Correctness.

For example, I recently finished a massive multi-tenant refactor (moving from user-owned to workspace-owned data).

5.1 Max typically tries to patch this at the API layer in such cases (faster, but prone to leaks).

On the other hand, 5.2 immediately went to the Database layer and started with enforcing RLS first, and fail-closed migrations before even touching the application code.

It prioritized the security model over the feature code.

'Better' simply means I can rely on it to produce "Correct" code rather than just something that just works, it saves me a ton of rework, and fewer bugs. And extensive testing - although this is instructed in my Agents.md.

With 5.1 (or other models), I usually get code that works but misses these deep architectural invariants. 5.2 acts like a Senior engineer, it prioritizes system stability and security over just closing the ticket and making it work.

Dec 15 '25 13:12 ReachDelta4

@ReachDelta4 : Thanks you very much for sharing this — that’s a really helpful distinction.

“Architectural correctness” is a strong metric, and your multi-tenant example makes a lot of sense: starting at the DB layer with RLS + fail-closed migrations is exactly the kind of invariant-first approach that prevents subtle tenant-leak bugs, whereas patching at the API layer can feel “done” while still leaving sharp edges.

Also useful context that you’re already driving tests via Agents.md — so what you’re really seeing is the model consistently choosing the right layer and sequencing (security model → migrations → app changes), not just producing more tests.

Thanx a lot :-)

Dec 15 '25 14:12 DKWien