Lake icon indicating copy to clipboard operation
Lake copied to clipboard

not robust to shell meta-characters (e.g. space, ", $) in paths

Open gvvaughan opened this issue 11 years ago • 4 comments

I'm trying to build luabuild on my Mac, which (due to multiboot) places my home directory at '/Volumes/Macintosh HD/Users/gary' (note the space)... causing it to fail when calling ar. I managed to get it working by naively adding single quotes to the rules in the lake source it comes with, but wanted to make a pull-request for something more robust...

But, internally, file lists are stored as space separated lists, and so only some global variable substitutions should check for metacharacters and self-quote on expansion. Along similar lines quote_if_necessary only checks for spaces, and not other active characters that can confuse the shell.

Also, I don't have (or want) access to Windows, so I have no idea on what the right approach would be so that a patch doesn't break things there.

How do you think I should proceed?

gvvaughan avatar Jul 01 '13 06:07 gvvaughan

Files usually come in as space/comma separated lists, but are converted to lists as soon as possible. (You can always pass a list directly). utils.split_list does understand the Unix \ quoting rule.

quote_if_necessary is clearly not adequate here - that's general logic which is the Windows way.

local function quote_if_necessary (file)
   if file:find '%s' then
     if Windows then 
       file = '"' .. file .. '"'
     else
       file = file:gsub('([^\\])%s','%1\\ ')
    end
 end
 return file
end

I'm trying to get Lake to respect files with spaces and basically I've hit a representation problem; do we keep files internally without escapes, and translate to/from on input/output? That seems more consistent.

I've managed to get the following little lakefile to work:

c.program{'main',src='main one\\ 1'}

although portable notation would be to use double quotes. But then lake clean doesn't work properly.

So there two issues:

  • getting paths-with-spaces on input (list,quote or escape)
  • make sure they are quoted appropriately when shelling out (quote_if_necessary)

It's a tricky little problem, but it manifests itself well on Linux so I'll keep going ;)

Thanks for the input, since I'd really like to get a new release of this beast out soon - and OS X is not my strong point, since I don't have access to a modern machine (my wife's PPC Tiger iMac only goes so far!)

steve d.

stevedonovan avatar Jul 01 '13 08:07 stevedonovan

OS X is a red herring - I'm using zsh (in posix mode) from an ansi terminal emulator, so apart from some slightly unfamiliar standard paths, everything works the same as if I were in zsh on GNU/Linux.

Fully general handling of active metachars in file names before handing off to the shell for execution is a superset of the problem I'm experiencing: in my case, because I have a space in the path to my home directory, $(TARGET) (among others) is expanded into a string with an unquoted space, which is treated as two non-existing paths by the time the shell has finished word-splitting.

Making the (bad?) assumption that $(TARGET) is always a single path, I patched that as follows:

diff --git a/lake b/lake
index 53d86ee..62c621c 100644
--- a/lake
+++ b/lake
@@ -528,7 +551,8 @@ function utils.subst(str,exclude,T)
             else
                 local s = T[f]
                 if not s then return ''
-                else return s end
+                elseif f == "TARGET" then return quote_if_necessary(s)
+               else return s end
             end
         end)
     until count == 0 or exclude

But, spaces (or other shell metachars) in expansion of other variables is not so straight forward, since spaces may or may not be in a path, e.g: $(LIBS) => -L/path/with spaces -llib -lm :(

Here's an improvement to quote_if_necessary adapted from std.string.escape_shell(), though I've ignored backticks and dollar because they each open another can of worms:

  • dollar is also used for unexpanded Lake variables
  • backtick expansion removes a level of quotes from its result before splicing and re-expanding, which makes my head hurt at this stage ;)
diff --git a/lake b/lake
index 86e8ec9..ab6d468 100755
--- a/lake
+++ b/lake
@@ -294,11 +294,16 @@ function path.abs(...)
     return table.concat(args,DIRSEP)
 end

+local shell_metachars_patt = "[%s%(%)%\\%[%]\"']"
+
 local function quote_if_necessary (file)
-    if file:find '%s' then
-        file = '"'..file..'"'
+    if Windows and file:find '%s' then
+        return '"'..file..'"'
+    elseif file:find (shell_metachars_patt) then
+        return file:gsub ("("..shell_metachars_patt..")", "\\%1")
+    else
+        return file
     end
-    return file
 end

 -- this is used for building up strings when the initial value might be nil

So actually there are three sub-problems:

  1. handling shell meta-characters in paths written (or constructed) in lakefile
  2. handling shell meta-characters after variable expansion
  3. robust quoting of the above when passing any commands to a shell for execution

I think the quote_if_necessary patch above is an 80% solution to (3); the hack further up is at least a demonstration of (2), but barely fixes a thing; I hadn't even considered (1) until you replied... though it wouldn't be the end of the world if you just punted and said that the lakefile author is responsible for quoting lakefile paths (and other strings that will be read by the shell) appropriately.

I'm happy to provide more patches or advice if that's helpful, but since you are already appear to have this problem in hand, I won't duplicate your work with an independent solution! :)

Food for thought: wrapping POSIX fork() and exec() instead of using os.execute() sidesteps the whole shell madness on Unixy machines, though you'd still have to deal with spaces robustly when splitting a string into arguments. And then you'd also need to do something quite different on Windows, which AFAIK doesn't have the concept of forks.

gvvaughan avatar Jul 02 '13 10:07 gvvaughan

For the record, make has exactly the same problem, which leads to having to write finicky rules like this:

libtool: $(ltmain_sh) $(config_status) $(dotversion)
        @$(rebuild); \
        if test -f '$@'; then \
          eval `'$(SED)' -n '/^package_revision=/p' '$@'`; \
          test "$$package_revision" = "$$revision" && rebuild=false; \
        fi; \
        for prereq in $?; do \
          case $$prereq in *.version);; *) rebuild=:;; esac; \
        done; \
        if $$rebuild; then \
          if test 0 = '$(AM_DEFAULT_VERBOSITY)' && test 1 != '$(V)'; \
            then echo "  GEN     " $@; \
          else echo '$(SHELL) $(top_builddir)/config.status "$@"'; fi; \
          cd '$(top_builddir)' && '$(SHELL)' ./config.status '$@'; \
        fi

Notice the proliferation of ugly single quotes around anything that contains an unknown path, and hoping nothing arrives here with any embedded single quotes after make macro expansion. Yuck!

So, on reflection, I'll retract my earlier comment about punting on managing meta-characters embedded in paths written to lakefile - that's asking for a similar mess all over.

gvvaughan avatar Jul 02 '13 10:07 gvvaughan

I'm not surprised that make has the problem - its makers also could not imagine paths-with-spaces! It's been one of the facts I haven't fully internalized in my professional career - that one has to handle spaces and sometimes people cannot avoid this.

The whole idea is to keep internal filenames (once processed) in clean form, without double-quotes (Windows) or backslash escapes (Unix). On output, the appropriate quoting is used.

As for our friend the shell, we'll just have to make sure that no weird shit gets passed to it - interesting idea, using luaposix for process launching (its Windows sister winapi also allows for raw process manipulation.) Now I'm using both of these modules to add special sauce to Lake, particularly implementing a parallel job queue and setting environment variables. But I'm loathe to change the basic simplicity of raw Lake - one file plus luafilesystem.

In a portable system like Lake, I don't lean much on the shell, making a few assumptions:

  • redirection, 2>&1 (entertainly enough Windows shell knows the latter!)

  • parameters surronded with double-quotes can contain spaces

Your quote_if_necessary looks good - the trick is to apply it in all the needed places! Very educational!

An interesting feature of the new Lake is that objects can be targets, if they have a 'time' field... so there's now checks to make sure that we don't apply targets-are-files operations willy-nilly.

Ah well, they do say that this is a non-trivial problem. Probably like Don Quixote tilting at windmills, but let's get this right.

stevedonovan avatar Jul 02 '13 12:07 stevedonovan