rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

RFC: proposal for filtering `WalkerHandler` directories

Open aturley opened this issue 5 years ago • 7 comments

Rendered

aturley avatar Aug 14 '19 23:08 aturley

Have you considered changing the dir_entries from an array to a list (in WalkerHandler, Directory.entries and other places) as an alternative?

dipinhora avatar Aug 14 '19 23:08 dipinhora

@dipinhora using a list would get around the awkwardness around removing items from the array. Two things come to mind with that approach:

  1. What would the performance impact be? I think using an array is probably going to be faster than a list in terms of creation and traversal, though not necessarily in terms of removal.
  2. The idea of modifying a structure that is passed into the function rather than returning a value feels a little strange. As far as I can recall, this is the only place in the stdlib where we do something like that. So whether we're passing in an array or a list, I don't think the handler code should be modifying it.

aturley avatar Aug 15 '19 14:08 aturley

@aturley

  1. re: performance, the list create code in Directory.entries isn't written with performance in mind currently as it is not pre-allocating an array of whatever number of entries it will need (https://github.com/ponylang/ponyc/blob/master/packages/files/directory.pony#L92). Given that, I don't think that the list creation using an array will necessarily be faster. Agreed that traversal using an array would be faster. Either way, I'm not sure that how performant this code needs to be but it would likely require thinking more holistically about it.
  2. I find it amusing that you mentioned the performance concern in item 1 and then didn't want to modify the list in place for this item which would likely be more performant when using a list as compared to creating a new list or array. Regardless, I'm ambivalent to whether the handler code modifies it in place or has to make a copy but I would prefer to keep the option of being able to modify in place.

Probably a better option altogether would be to have a function that provides a true/false or similar for filtering each entry when passed to it instead of passing the full list to the handler so there's no modification of the list nor the need to create a new list either. When the function returns a true the walk happens and when the function returns a false the walk is skipped.

dipinhora avatar Aug 16 '19 00:08 dipinhora

@dipin

There are tradeoffs here between performance and various abstractions. If we truly wanted the best performance we would write it in assembly. But, since we've chose to use Pony I'm guessing we're all at least implicitly willing to make certain performance tradeoffs. One of the questions here is "which tradeoffs between performance and ease of use and consistency are we willing to make?"

Can you provide an example of what you're thinking of with the true/false filtering? I have an idea of how that might work but I'd like to see if we're thinking of the same thing.

aturley avatar Aug 16 '19 03:08 aturley

@aturley Yes, of course. Sorry if my earlier comment came across as antagonistic. That wasn't my intent.

The following is something along the lines of what I meant by the true/false filtering:

class MyWalkerHandler is WalkerHandler
  fun ref apply(dir_path: FilePath, dir_entry: String): Bool =>
    let skip_directories_with_string = "IGNORE_ME"
    if dir_entry.contains(skip_directories_with_string) then
      false
    else
      true
    end

with the following versions of WalkerHandler interface:

interface WalkerHandler
  """
  A handler for `FilePath.walk`.
  """
  fun ref apply(dir_path: FilePath, dir_entry: String): Bool

and Filepath.walk:

fun val walk(handler: WalkHandler ref, follow_links: Bool = false) =>
    """
    Walks a directory structure starting at this.

    `handler(dir_path, dir_entry)` will be called for each directory
    starting with this one. The handler returns true/false for which subdirectories
    to walk or skip.
    """
    try
      with dir = Directory(this)? do
        var entries: Array[String] ref = dir.entries()?
        for e in entries.values() do
          if WalkHandler(this, e) then
            let p = this.join(e)?
            let info = FileInfo(p)?
            if info.directory and (follow_links or not info.symlink) then
              p.walk(handler, follow_links)
            end
          end
        end
      end
    else
      return
    end

dipinhora avatar Aug 16 '19 05:08 dipinhora

I've never used the API in question. I know that @aturley has used it a lot. I trust his judgment on the changes that are needed. I'm sure we have a lot of APIs in the standard library that could use some love. Particularly in the file related sections of the standard library.

SeanTAllen avatar Jan 15 '20 02:01 SeanTAllen

@aturley are you interested in continuing with this RFC?

SeanTAllen avatar Feb 09 '22 01:02 SeanTAllen