os-lib icon indicating copy to clipboard operation
os-lib copied to clipboard

Instrumented path based operations using hooks defined in `Checker`

Open ajaychandran opened this issue 1 year ago • 14 comments

Instrumented path based operations using hooks defined in Checker.

trait Checker {
  def onRead(path: ReadablePath): Unit
  def onWrite(path: Path): Unit
}

Exceptions

The following operations were not instrumented:

  • followLink, readLink
  • list, walk
  • exists, isLink, isFile, isDir
  • read operations for permissions/stats
  • watch

Future work

  • A more comprehensive design would add hooks for each core operation. This would eliminate the special check handling in operations like move and symlink.
  • As such, the methods of ReadablePath represent escape hatches. These cannot be "plugged" without breaking binary compatibility.

This resolves part 1 of mill #3746.

ajaychandran avatar Oct 23 '24 03:10 ajaychandran

@lihaoyi Please review.

ajaychandran avatar Oct 24 '24 04:10 ajaychandran

Thanks @ajaychandran, I think this looks good. Could you add Scaladoc to the checker API as well? Then I can merge this and cut an unstable -M1 so you can start using it in Mill

lihaoyi avatar Oct 24 '24 07:10 lihaoyi

The following operations were not instrumented:

Let's instrument these as well. For the purposes of Mill's (and probably others), we also want to block code from making decisions based on filesystem metadata outside their designated sandboxes, not just file contents

lihaoyi avatar Oct 24 '24 07:10 lihaoyi

A more comprehensive design would add hooks for each core operation. This would eliminate the special check handling in operations like move and symlink.

@lihaoyi Any thoughts on this? I'd rather not add special handling in the operation implementations. The number of core operations is not that large. We could even pass in additional parameters like createFolders, followLinks to have the full context available.

ajaychandran avatar Oct 24 '24 08:10 ajaychandran

Could you add Scaladoc to the checker API as well?

I had removed Scaladoc from the PR description. Do we need more than this?

/**
 * Defines hooks for path based operations.
 *
 * This, in conjunction with [[checker]], can be used to implement custom checks like
 *  - restricting operations to some path(s)
 *  - logging operations
 */
trait Checker {

  /** A hook for a read operation on `path`. */
  def onRead(path: ReadablePath): Unit

  /** A hook for a write operation on `path`. */
  def onWrite(path: Path): Unit
}```

ajaychandran avatar Oct 24 '24 08:10 ajaychandran

@ajaychandran I'm not sure what you mean by special check handling. Could you elaborate?

  • For move, I would expect it to call onWrite to both source and destination paths, since it is modifying both.
  • For symlink, I would only expect it to call onWrite for the path of the created link, and do nothing for the path the link is pointing to (since we are neither reading anything from it nor modifying it)

I think for the purposes of this ticket, let's stick with a simple onRead and onWrite API. That's what we'll need in Mill. We can flesh it out more in future releases as necessary

lihaoyi avatar Oct 24 '24 09:10 lihaoyi

@lihaoyi The current implementation assumes an equivalence between write and create/remove operations. I am not sure if this would fit all use cases.

  • The modified implementation for move checks for write access on the source parent directory. I gather this is not per your expectation.
  • I assumed a symlinked path could be used as argument to read. If so, a user could gain access to any file.
  • Should makeDir test write access on an existing parent folder when createFolders is true?

Adding a hook for each core operation lets os-lib remain unopinionated. Maybe rename Checker to Hook?

ajaychandran avatar Oct 24 '24 10:10 ajaychandran

Good point on the symlinks and enclosing folders. In the end this is not going to be an airtight sandbox, so it's a judgement call exactly where to draw the line.

  1. For move, let's call onWrite it on the full source and destination paths

  2. I think for creating enclosing folders, let's just punt on that and say it's up the the implementer of def onWrite

  3. For symlinks, I think what you say is reasonable, so lets call onWrite on the destination path, since we can compute it in-memory by resolving any relative path without needing to perform any filesystem operations

I'm a bit reluctant to try and match the entire OS-Lib API in the Checker interface. Let's go with the more minimal ad-hoc interface for now.

For the purposes of this pull request, could you add an @experimental annotation, apply it to the stuff you added, and add it to the def mimaExcludeAnnotations in the build file? Just to make it clear that this API isn't stable, and we're only publishing it because we need to experiment with it in Mill

lihaoyi avatar Oct 24 '24 11:10 lihaoyi

@ajaychandran not everything needs to be experimental, only the public APIs you added: os.checker and os.Checker I think

lihaoyi avatar Oct 25 '24 11:10 lihaoyi

@lihaoyi This is ready for another review.

ajaychandran avatar Oct 25 '24 11:10 ajaychandran

Updated hardlink as well.

ajaychandran avatar Oct 25 '24 12:10 ajaychandran

@ajaychandran code changes look good. Last request here is to move all the checker tests into one separate test suite, rather than having them mixed in with all the non-checker tests. That will make it easier to skim them and audit them as necessary.

After that i will merge this and cut a milestone release for experimentation in Mill

lihaoyi avatar Oct 26 '24 05:10 lihaoyi

@lihaoyi Checker tests have been moved.

Some compound operations like zip incur the cost of double-checking. One solution would be to define a private[os] def unchecked variant, for each operation, for internal use. This can be deferred until the feature is stable.

ajaychandran avatar Oct 26 '24 07:10 ajaychandran

@lihaoyi Checker tests have been moved.

Some compound operations like zip incur the cost of double-checking. One solution would be to define a private[os] def unchecked variant, for each operation, for internal use. This can be deferred until the feature is stable.

Another option is to delegate some sort of caching solution to the implementor of Checker.

dabd avatar Oct 26 '24 08:10 dabd

@ajaychandran pushed a tag for 0.11.4-M1 https://github.com/com-lihaoyi/os-lib/actions/runs/11537940922 hopefully it goes out soon and you can try it out in Mill. If you're going to work on the downstream tasks, we can hold off on payment for now to save on transfer fees. If not I can transfer the first portion of the bounty using the details we used before

lihaoyi avatar Oct 27 '24 05:10 lihaoyi

@lihaoyi Thanks. Lets hold off payment.

ajaychandran avatar Oct 27 '24 05:10 ajaychandran