data icon indicating copy to clipboard operation
data copied to clipboard

Non-nullable self-referencing relations

Open kettanaito opened this issue 2 months ago • 5 comments

It is currently impossible to provide a self-referencing object(s) to Zod (I believe any schema library) as it will attempt to crawl it deeply and end up stuck in an infinite loop. Currently, this is circumvented by:

  1. Circular relations being marked as .optional() so their values in the schema are not required.
  2. All relations are detected when sanitizing and replaced with .getDefaultValue() from the appropriate relation class.

Need to decide if this is acceptable or a better approach can be recommended.

  • Sucks interfering with the user-defined schema. Some relations may be required but there's no way to describe that.

kettanaito avatar Sep 28 '25 19:09 kettanaito

It's worth raising this with Zod and see if we can come up with a flag to prevent deep object traversal. There's also been a suggestion to try z.lazy() so Zod skips that traversal.

kettanaito avatar Sep 28 '25 19:09 kettanaito

Lazy property definitions

Using z.lazy() does help solve the infinite object crawl issue but it doesn't solve the requirements entirely. You are still stuck in occasional impossible states if two collections self-reference each other with required properties:

// Imagine a one-to-one two-day relation.
user.post - one -> posts
posts.author - one -> users

While Zod can validate such objects, you cannot define them. post is required while creating a user and author is required while creating a post. Either way, you end up with one of the .create() calls erroring on a missing required property. Granted, this isn't Zod's issue anymore.

Also, in one-way relations, you might end up with the referenced property not being able to set itself as a foreign key on the owner model if it doesn't have any relations to the owner:

 const users = new Collection({ schema: userSchema })
  const posts = new Collection({ schema: postSchema })

  users.defineRelations(({ many }) => ({
    posts: many(posts),
  }))

  const firstUser = await users.create({
    id: 1,
    posts: [
      await posts.create({
        title: 'First',
        // "author" is missing!
      }),
    ],
  })

But if you swap the declaration order to solve the self-reference, creating a post will not set itself into user.posts as the posts collection does not define any relations to the users:

const firstUser = await users.create({
  id: 1,
  // Leverage the fact that empty arrays pass
  // the z.array() check (unless .nonempty()).
  posts: [],
})
await posts.create({
  title: 'First',
  // Setting "author" on this post will not automatically
  // set this post into the "user.posts" many relation.
  author: firstUser,
})

I'm not sure if we should handle this explicitly. Right now, a relation is only observing the foreign collection for create and delete events, and in those observations, it always looks up the foreign relation to the owner. If it doesn't have one, it skips those events.

kettanaito avatar Sep 29 '25 15:09 kettanaito

Update

The consensus was to propose a feature to Zod to prevent it from validating infinitely nested objects, especially if any of the nested objects is referentially equal to the root object (think user.posts[n].author = user). Colin has suggested I looked at how ArkType does that for they support this already.

Wish this could be solved on the Standard Schema level, but that's out of its concerns.

kettanaito avatar Oct 05 '25 12:10 kettanaito

@kettanaito I have an idea for how to solve this problem at the library level.

Let's call it a transactional API with a two-phase protocol:

  1. Phase 1: Create record skeletons (tx.create) with deferred references.
  2. Phase 2: Declare references (tx.link).
  3. Commit: Check all relations and standard-schema. If something fails, roll back the created records and relational changes, throwing an error.

The API may be different, but the idea should be the same. How this might look in code(pseudo-code):

Self‑required one‑to‑one example:

const userSchema = z.object({
  id: z.number(),
  get manager() {
    return z.lazy(() => userSchema)
  },
})

const users = new Collection({ schema: userSchema })
users.defineRelations(({ one }) => ({ manager: one(users) }))

await users.transaction(async (tx) => {
  const a = await tx.create(users, { id: 1 })
  const b = await tx.create(users, { id: 2 })
  await tx.link(a, 'manager', b)
})

Mutually required example:

const userSchema = z.object({
  id: z.number(),
  get profile() { return profileSchema },
})
const profileSchema = z.object({ 
  id: z.number(),
  get user() { return userSchema },
})

const users = new Collection({ schema: userSchema })
const profiles = new Collection({ schema: profileSchema })

users.defineRelations(({ one }) => ({ profile: one(profiles}))
profiles.defineRelations(({ one }) => ({ user: one(users)}))

await users.transaction(async (tx) => {
  const u = await tx.create(users, { id: 1 })
  const p = await tx.create(profiles, { id: 1 })
  await tx.link(u, 'profile', p)
  await tx.link(p, 'user', u)
})

yslpn avatar Oct 21 '25 07:10 yslpn

@yslpn, that's an interesting proposal, thank you. I suppose it could work technically, but I dislike that it separates the logical linking of related records, making it consist of two steps:

  • .defineRelations()
  • .link()

I tend to think that if the logic is inseparable (i.e. defining a relation without linking records doesn't make sense), then it shouldn't be separated in the API either.

The culprit here is the inability of schema validation libraries to handle self-referencing objects that do so via getters. I've already proposed a PR to Zod to fix that, and with that PR the issue is indeed gone! Perhaps addressing this issue in those libraries makes more sense for everyone.

kettanaito avatar Oct 25 '25 10:10 kettanaito