deno_std icon indicating copy to clipboard operation
deno_std copied to clipboard

Deprecating "exists" because of some use cases ignores legitimate use cases

Open dionjwa opened this issue 3 years ago • 4 comments

Latest docs: https://deno.land/[email protected]/fs/mod.ts?s=exists

"Checking the state of a file before using it causes a race condition. Perform the actual operation directly instead."

So it's saying instead of:

  1. Check if file/dir exists
  2. If it exists, do the thing

instead do

  1. Just do the thing on the file/dir (maybe handle errors)

Rather than that flow, I have this flow:

  1. Check if a file/dir exists
  2. If it doesn't exist, do expensive thing

This doesn't fit into "Perform the actual operation directly instead." scenario. I don't want to perform the operation if the directory or file exists. But I need to check that state beforehand. I have this pattern in many places.

What would the alternative be?

I can understand the documented scenario is an anti-pattern, but this feels like deprecating crucial functionality without a clear pattern for legitimate use cases.

dionjwa avatar Sep 01 '22 00:09 dionjwa

FYI: https://github.com/denoland/deno_std/discussions/2102

NotFounds avatar Sep 01 '22 10:09 NotFounds

  1. Check if a file/dir exists
  2. If it doesn't exist, do expensive thing

If the check is for a file (not dir), and expensive thing is creating that file at the checked path, then that case can be covered by Deno.open() with createNew: true option, which creates a file only when the file doesn't exist.

kt3k avatar Sep 01 '22 11:09 kt3k

@kt3k That assumes:

  • that you are only creating a single file (are you really going to do this check on every single file if there are lots?)
    • that creates many edge cases in case that state flow is interrupted. Often safer to restart and redo.
  • that the creation of the contents of the file should done after opening the file and possibly failing
  • that this whole song and dance with Deno.open etc etc etc is somehow a better pattern than just checking if the file/dir exists.

These feel like very complex, cumbersome, and non-intuitive proposals to a potential problem that isn't quantified nor experienced in the wild in many contexts, and especially not in my use cases. I will never experience the stated threat in the use cases for the tooling I build with deno (CLI tools, build tools, automation).

dionjwa avatar Sep 02 '22 17:09 dionjwa

Related: #2125. Perhaps, this is a matter that requires more thorough documentation.

iuioiua avatar Sep 06 '22 07:09 iuioiua

@kt3k, I think it's worth:

  1. Improving documentation, strongly and clearly recommending the "do the action directly" method.
  2. Ensure any instance of exists within std is removed in favour of "doing the action directly", following our own recommendation.
  3. Reverting the deprecation.

I'm happy to work on this. What do you think?

iuioiua avatar Sep 30 '22 08:09 iuioiua

@iuioiua I think this is a good solution.

timreichen avatar Sep 30 '22 15:09 timreichen

I feel like the use cases I put forward are not understood. All the suggestions are assuming that you want to write to the file, or create it if it doesn't exist, rather than take an unrelated action if the file exists. None of the suggestions work in this case, since I don't want to create the file or directory, just to check if it exists. I suppose we'll have to create a third-party module to cover this basic use-case.

dionjwa avatar Oct 01 '22 20:10 dionjwa

It'd be worth pointing out that std/log has a valid use case: https://github.com/denoland/deno_std/blob/f096e33013fad0dfbb5a63f9d7960622a2ffe120/log/handlers.ts#L202-L209

If exists is to be deprecated, a reasonable alternative for this use case has to be vetted.

iuioiua avatar Oct 07 '22 23:10 iuioiua

There's types of usages like assertEquals(existsSync(srcFile), true); in copy and move testing. Usages in testing might be also another example of legit usage.

kt3k avatar Oct 09 '22 20:10 kt3k

Do these use cases provide sufficient reasoning to undo the deprecation of exists for now? I think they do.

iuioiua avatar Oct 09 '22 21:10 iuioiua

It'd be worth pointing out that std/log has a valid use case:

https://github.com/denoland/deno_std/blob/f096e33013fad0dfbb5a63f9d7960622a2ffe120/log/handlers.ts#L202-L209

If exists is to be deprecated, a reasonable alternative for this use case has to be vetted.

@iuioiua My recent research shows that this code would throw an unhandled exception on Windows, when the backup file exists, but is not readable. If you'd use isReadableFile as substitution, it would not detect files with missing read permissions. Attempting to write on a file that has missing read-permissions could cause data loss if write-permissions are present.

Your code sample shows me that isReadable would not be substitute in all circumstances. Maybe the undo of the deprecation of exists with additional implementations (dirExists and fileExists?) would be preferable, but it should return true when Deno.errors.PermissionDenied is thrown, which can only happen on Windows.

What are your thoughts?

martin-braun avatar Oct 19 '22 03:10 martin-braun

I agree with @martin-braun's analysis here. I would suggest that we deprecate (and eventually remove) exists, and replace it with dirExists and fileExists. I think it's pretty important to give people a route to migrate to newer versions of std. If we just remove exists, it is really going against this philosophy.

I, personally, have run into a very similar problem as @dionjwa where I needed to do something (not a file write), if something did not exist and would really like a non-deprecated solution.

lino-levan avatar Nov 02 '22 15:11 lino-levan

@lino-levan See also https://github.com/denoland/deno_std/pull/2785 . I believe it's the latest version of the proposal.

kt3k avatar Nov 02 '22 16:11 kt3k

@kt3k Which is probably stale. I could maybe move forward when https://github.com/denoland/deno_std/issues/2791 gets fixed, but I seriously question if the permission part should be included, or not.

Unfortunately, my last comment in this issue got no response. Maybe you could look into it, please?

My concern is, that there are use-cases in which fileExists is better than readableFile, for instance, when trying to ensure that a given file does not exist, thus safely being able to write a new file without data loss. (see https://github.com/denoland/deno_std/issues/2594#issuecomment-1272151584).

On the other hand, not having a fileReadable would still require to use try-catch to handle permission issues, unless an unhandled error is appreciated.

Third option would be to simply implement fileExists and dirExists as well, probably in a different PR, but I don't really want to pollute the std, thus I'm hereby seeking advice.

martin-braun avatar Nov 03 '22 04:11 martin-braun

@martin-braun I probably don't understand the question well. It sounds just fine to return false when PermissionDenied thrown (on windows). What could be the issue in that case?

BTW I feel it's a little verbose to have both fileExists and dirExists. How about passing options like isDir: true, isFile: true instead?

kt3k avatar Nov 03 '22 07:11 kt3k

@kt3k The primary goal was to undo the deprecation while improving the situation to reduce the potential for TOCTOU scenarios. Including permission checks was an idea and I was motivated to get consistent behavior between the platforms, thus making readable* instead of *exists functions. On Windows PermissionDenied is raised on failed stat access, on POSIX, it's not. Not good. Another issue is readable* would be problematic when checking if a file does not exist. It would return false, even when the file exists, but no read permissions are available to the user.

So instead, I'm playing with the thought to just re-implement *exists, again, but this is problematic as well:

  • The user still has to catch PermissionDenied for Windows
  • For POSIX systems the user cannot be sure the file can be read, until he/she tries to open the file or analyses its modes

The first point is crucial for me. People want exists back for the ease, but if it fails to address permission and type issues users fall in pits or introduce more TOCTOES.

The second point is important for better consistency between the platforms.

In regards of introducing isDir: true or isFile: true to have just one exists: Feels wrong to me, since the user can cause illogical states (both true or both false).

The bottom line is: Both paths fail to be consistent between platforms, both cannot catch all basic use-cases, I'm neither happy with my draft (which is blocked by unstable implementations), nor with the old exists which had many pitfalls.

martin-braun avatar Nov 03 '22 08:11 martin-braun

In regards of introducing isDir: true or isFile: true to have just one exists: Feels wrong to me, since the user can cause illogical states (both true or both false).

We can throw TypeError for wrong usages.

I still don't see what's wrong about returning false in case of PermissionDenied

kt3k avatar Nov 03 '22 09:11 kt3k

I still don't see what's wrong about returning false in case of PermissionDenied

In case of exists I would say the problem is that only Windows will throw PermissionDenied, thus we have inconsistent behavior between the platforms.

exists will produce different results between Windows and POSIX if the file is available, but cannot be read by the user.

We can throw TypeError for wrong usages.

Ok sure, we can do that.

So you think the best is to ignore the issue I just mentioned and simply re-implement exists with isFile and isDir settings and the user has to be prepared for any permission issues on POSIX systems down the line?

martin-braun avatar Nov 03 '22 10:11 martin-braun

I feel like there's some amount of miscommunication going on here.

I'm kind of on a fence here, but I think we should push for isReadable({file?:boolean, folder?:boolean}) (the types here aren't set up properly). I think there isn't much controversy on that front, and afaik the naming there is pretty clear. I think a PR with just isReadable makes sense.

I do get @martin-braun's point here which is, sometimes I really do want to check if a file exists and not just that it's readable by me (I think the examples they've provided show enough usecases for that). If we do decide to remove the current implementation and create another one:

  1. It should have the same arguments as isReadable
  2. It should be focused on determining if a file exists.
    • On POSIX, this is fine and there is no controversy.
    • On Windows, regardless of its readability (we will have a separate method if you want to check that), we want to be able to say that a file exists.
    • If it throws a permission error, it means it exists.
  3. The JSDoc for exists should mention, and recommend, using isReadable unless you really can't in your use case.

Regardless of which route we take, I think that the JSDocs for (both of) them should still recommend "just do the operation directly and catch errors" for the vast majority of things.

lino-levan avatar Nov 03 '22 15:11 lino-levan

@lino-levan Now I'm actually interested in changing my PR to implement exists and isReadable together and redirect users in the docs from exists to isReadable for the 99% of the time that this is preferred. As soon as uid() and gid() are stabilized, there is really nothing that prevents me to do that.

One last thing though: I think that the 3 arguments rule holds usually, but having to define two arguments when a folder is checked (isReadable(false, true)) feels wrong. I think it would be much better to define a settings type, so you can do:

isReadable({
	folder: true
})

Thanks.

martin-braun avatar Nov 04 '22 16:11 martin-braun

@martin-braun https://github.com/denoland/deno/pull/16424 just landed in deno 1.28.0, is there any chance you could update your PR to use uid() and guid()?

lino-levan avatar Nov 14 '22 01:11 lino-levan

@lino-levan No worries, I'm aware of this and I'm going to tackle this soon, since the path is graded, now.

martin-braun avatar Nov 14 '22 12:11 martin-braun

Idea for further improvement:

const isReadable = await fs.exists("path/to/file.txt", {
	isFile: true,
	isReadable: true // also require file to be readable
});

The idea is to not just combine the path type check, but also the permission check, into one function. It's not very pretensive against bad programming styles that might lead to race conditions (similarly to the former exists), but it certainly allows to avoid all the pitfalls that could only be solved by using try/catch/lstat previously.

  • On POSIX systems options.isReadable would run the permission check on stat.mode
  • On Windows systems, a PermissionDenied will make fs.exists return !options.isReadable

I'm curious if you agree or not.

martin-braun avatar Nov 16 '22 02:11 martin-braun

Double thumbs on this one, I feel like the syntax is a lot clearer. I do want to really make sure the documentation is on point if we are essentially un-deprecating exists.

Also bumping this:

The JSDoc for exists should mention, and recommend, using isReadable unless you really can't in your use case.

lino-levan avatar Nov 16 '22 02:11 lino-levan

The JSDoc for exists should mention, and recommend, using isReadable unless you really can't in your use case.

There is also an option to opt-in for another function name and have isReadable: true (or maybe ignorePermissions: false) by default to address this passively. Although, I'd have hard times in naming such function. Any suggestions are welcome.

martin-braun avatar Nov 16 '22 02:11 martin-braun

I think a sensible solution would be to add a second optional ExistOptions argument that can extend its functionality, including isFile, isReadable, isDirectory properties, etc.

Outside of that, I think exists should remain as is. The name exists may not be 100% technically correct. However, it best reflects its most common use case.

The documentation recommends not using it and provides examples of how to do so. Also, there appears to be enough evidence of valid use cases. For this reason, I think the deprecation should be undone, and the codebase should steer away from using it internally where reasonable.

Frankly, but respectfully, I think the amount of time and effort surrounding this function has been overplayed. It'd be nice for a sensible agreement to be met and move on.

iuioiua avatar Nov 16 '22 02:11 iuioiua

@iuioiua I agree. My PR is almost ready, but I need https://github.com/denoland/deno/issues/15576 to be addressed first. Let me explain:

Deno.errors.PermissionDenied (thrown by Deno.stat) is either raised from missing to provide the --allow-read flag or by trying to access a file that is protected by the file system. Only for the latter I have to return !options.isReadable, while I have to throw the error for the former. I will use Deno.permissions.query to distinguish between both cases like so:

// ...
  } catch (error) {
    if (error instanceof Deno.errors.NotFound) {
      return false;
    }
    if (error instanceof Deno.errors.PermissionDenied) {
      if (
        (await Deno.permissions.query({ name: "read", path })).state ===
        "granted"
      ) { // --allow-read not missing
        return !options?.isReadable; // PermissionDenied was raised by file system
      }
    }
    throw error;
  }
}

For existsSync I will need querySync, respectively.

I came to awareness after re-implementing the Scenes for missing --allow-read permission checks.

cc @lino-levan

martin-braun avatar Nov 17 '22 01:11 martin-braun

I think a sensible solution would be to add a second optional ExistOptions argument that can extend its functionality, including isFile, isReadable, isDirectory properties, etc.

Outside of that, I think exists should remain as is. The name exists may not be 100% technically correct. However, it best reflects its most common use case.

The documentation recommends not using it and provides examples of how to do so. Also, there appears to be enough evidence of valid use cases. For this reason, I think the deprecation should be undone, and the codebase should steer away from using it internally where reasonable.

Frankly, but respectfully, I think the amount of time and effort surrounding this function has been overplayed. It'd be nice for a sensible agreement to be met and move on.

@kt3k and @cjihrig, what do you think?

iuioiua avatar Nov 17 '22 20:11 iuioiua

@iuioiua I agree. My PR is almost ready, but I need denoland/deno#15576 to be addressed first. Let me explain:

Deno.errors.PermissionDenied (thrown by Deno.stat) is either raised from missing to provide the --allow-read flag or by trying to access a file that is protected by the file system. Only for the latter I have to return !options.isReadable, while I have to throw the error for the former. I will use Deno.permissions.query to distinguish between both cases like so:

// ...
  } catch (error) {
    if (error instanceof Deno.errors.NotFound) {
      return false;
    }
    if (error instanceof Deno.errors.PermissionDenied) {
      if (
        (await Deno.permissions.query({ name: "read", path })).state ===
        "granted"
      ) { // --allow-read not missing
        return !options?.isReadable; // PermissionDenied was raised by file system
      }
    }
    throw error;
  }
}

For existsSync I will need querySync, respectively.

I came to awareness after re-implementing the Scenes for missing --allow-read permission checks.

cc @lino-levan

The Deno.permissions API isn't needed because Deno.lstat requires --allow-read anyway.

iuioiua avatar Nov 17 '22 20:11 iuioiua

@iuioiua If I'm understanding @martin-braun's logic, I think you're missing the point. We need to query the --allow-read permission to the tell the difference between whether the user just didn't specify --allow-read or if the file is just not readable by us on windows.

Now that I say that out loud, is there even value in differentiating between these two @martin-braun?

lino-levan avatar Nov 17 '22 20:11 lino-levan