Zappa icon indicating copy to clipboard operation
Zappa copied to clipboard

Directories not excludable via zappa_settings.json

Open nueverest opened this issue 7 years ago • 27 comments

Context

Directories not ignored as expected. The zappa_settings.json looks like this. Note that I am using .gitignore notation for ignoring the directories.

https://github.com/nueverest/vue_flask/blob/master/zappa_settings.json

{
    "production": {
        "s3_bucket": "nueverest",
        "aws_region": "us-west-2",
        "app_function": "vue_flask.app",
        "delete_local_zip": false,
        "exclude": ["deployed/", "static/", "unit_tests/", "*.css", "*.js", "*.json", "*.csv", "*.ico"]
    }
}

Expected Behavior

I am expecting the folders named: deployed, static, and unit_tests to be excluded from the generated zip file.

Actual Behavior

When I open the generated zip file I see the folders that I thought were excluded. This increases the time it takes to deploy the app.

Possible Fix

Allow directory patterns to be excluded via zappa_settings, or document how it is currently implemented in the readme.

Your Environment

  • Zappa version used: 0.37.1
  • Operating System and Python version: Windows and Py 2.7.13
  • Link to your project (optional): https://github.com/nueverest/vue_flask
  • Temporary link to deployed site: https://kw4udfbos9.execute-api.us-west-2.amazonaws.com/production

nueverest avatar Feb 26 '17 16:02 nueverest

I agree we should change the implementation of this. What might work for you is use static without the trailing slash. Be aware that this will exclude all folders that are called static. No matter where they are in the folder tree. What you can't do for example is myapp/static, we should fix that.

michi88 avatar Feb 27 '17 14:02 michi88

@michi88 Doing it without the / fixed it for me. Excluding all folders with the same name does reduce granular control. I will structure my project accordingly.

nueverest avatar Feb 27 '17 18:02 nueverest

Thinking about how to implement this. We could change the behavior of those patterns in exclude so that they are removed from the temporary zip after we create it rather than the current way we do it now, which excludes them as we iterate over the venv/project contents. Could be more simple.

Miserlou avatar Feb 27 '17 19:02 Miserlou

We're using copytree(site_packages, temp_package_path, symlinks=False, ignore=shutil.ignore_patterns(*excludes)) now.

The problem with ignore=shutil.ignore_patterns is that it matches the current file or directory names on iteration and not the full path. This is why 'static' works and 'static/' or 'myapp/static' not. Also something like 'myapp/settings/settings_local.py' doesn't work, 'settings_local.py' does.

Note that if we change this and do for example what you describe @Miserlou + matching the full path by walking the zip ourselves we will break stuff for people as 'static' would not match anymore for example. We could also choose to keep the current behaviour and add that we also match on the full path. That might work. Thinking about it that is actually how gitignore behaves also.

michi88 avatar Feb 28 '17 11:02 michi88

One way to make 'static/' work the same way as 'static' with the current implementation is to simply remove the trailing slash before processing i.e. 'static/' --> 'static'.

This does not cover the 'myapp/static' case, but it is a step in the right direction.

nueverest avatar Mar 01 '17 14:03 nueverest

@michi88 i think it would be great to add an exclude_path option, just ran into some situations where I'd like to use the full path so i can include some things and not others that have the same filename but different path.

Honestly if wasn't a backwards compatibility issue I'd be inclined to suggest replacing the old exclude as exclude_path is much more useful/powerful and I can't see a reason to use the old exclude if exclude_path existed.

glassresistor avatar Oct 02 '17 22:10 glassresistor

Specifically I have a folder staticfiles I want to exclude which ends up keeping the staticfiles app out of my project completely.

glassresistor avatar Oct 02 '17 22:10 glassresistor

I think the way forward with this is to officially adopt the gitignore (wildmatch) syntax. It's well-known and widely used and—unlike the current glob-style syntax—powerful enough to meet everyone's needs.

Unfortunately it's not backwards-compatible with the current exclude, so I think we either need to accept the backwards-incompatibility or create a new setting and deprecate the old one.

In terms of implementation, it should be straightforward to do using the pathspec package. That allows you to turn the setting into a spec object and do, e.g., spec.match_tree('path/to/directory').

@Miserlou: I can provide a PR for this if we can reach consensus.

marfire avatar Nov 03 '17 11:11 marfire

The AWS X-ray SDK includes a subdirectory named 'botocore'. With the current exclude implementation my only choice to get instrumentation of boto calls to work, is to specify a custom exclude value that doesn't include 'botocore' -- and thus results in the botocore package being deployed.

There are also plenty of test/tests/testing subdirectory variations that are hard to target without being able to specify more parts of the path. Again, excluding all directories named test/tests/testing would be too broad.

kim-10e avatar Dec 28 '17 03:12 kim-10e

@Miserlou @marfire I'm currently blocked on this, so I'm forking and implementing a workaround that will get me going. If there's consensus on the way forward and @marfire hasn't gotten around to working on this, I'd be happy to submit a PR.

kim-10e avatar Jan 04 '18 02:01 kim-10e

This is blocking for me as well. I have two applications in a single folder and I'd like to be able to share common libraries for instances.

myapp1
myapp2
common
zappa_myapp1_settings.json
zappa_myapp2_settings.json

The issue with just excluding myapp1 is that it's recursive and you might omit directors that are in the venv which I learned the hard when my application was deployed and missing dependencies.

Has someone requested a pull request on this? I agree going with the gitignore convention would be most beneficial.

iteratelance avatar Jan 04 '18 07:01 iteratelance

@kim-10e: I haven't worked on it, so feel free to take it on. Were you thinking of implementing the approach I suggested above, or something else?

marfire avatar Jan 04 '18 13:01 marfire

I think the approach I landed on would actually be backwards compatible, yet allow more flexibility.

The current implementation traverses the input directories recursively and applies the exclude patterns to the entries of each directory. That means e.g. botocore ends up not only excluding the botocore package, but also any module directory within a package named botocore.

The patch I'm implementing changes the filter mechanism to consider the full path name, changes the pattern matching to respect directory boundaries, and introduces ** as a way to match across directory boundaries.

Examples:

botocore still excludes any file or directory with the name botocore any where in the input directory structure. No change from current behavior.

*.pyc matches any file entry ending with .pyc. No change from current behavior.

site-packages/botocore only matches botocore entries in a parent directory named site-packages. This dramatically reduces the risk of false positives without having to know the full absolute path (which varies based on the virtual environment location).

site-packages/dlib/**/*.so matches any .so files in the dlib package

I'm happy to discuss and adjust this approach before submitting a PR.

kim-10e avatar Jan 04 '18 18:01 kim-10e

The problem I see with your approach is that you're inventing yet another new pattern-matching language that is similar to, but subtly different from, the existing ones (glob, fnmatch, wildmatch). I worry that that will be a source of confusion and bugs for users. And the implementation itself will add to the maintenance burden of the project for something that is quite outside the core purpose of Zappa.

That's why I prefer the wildmatch solution I described above. That uses a well-known, standard syntax that many projects (including Zappa) are already using, and outsources the implementation to an external library.

It's also equally backwards compatible, albeit at the cost of adding a new setting. That's definitely a wart, but it's one that can be ultimately be fixed through a deprecation policy.

marfire avatar Jan 05 '18 12:01 marfire

Perhaps this is a silly question, but why do the exclude entries not simply use regex applied to the file paths, as advertised ("A list of regex patterns to exclude from the archive.")?

pagreene avatar Feb 20 '18 20:02 pagreene

@pagreene: I haven't looked at the commit history, but I'm assuming that's just a documentation bug. As far I know the project has never used regular expressions for exclude, and changing to that would have the same backwards compatibility concerns raised above.

And although regular expressions are certainly powerful enough for the job, I expect they would be more awkward to work with than languages designed specifically for specifying file paths (like wildmatch).

marfire avatar Feb 21 '18 04:02 marfire

@marfire: That makes sense. I contemplated making regex some kind of optional method for checking patterns, but decided that was too complicated/too much work.

Instead, I created a fix on branch in my fork that simply uses glob to check matches to either the file name alone or the complete path. This suits my purposes, and it seems a simple enough solution that guarantees backwards compatibility and a fairly intuitive behavior. It's not the prettiest solution, being combinatorically problematic for a large number of patterns and files, but that won't be an issue for me, as this is a fairly light-weight use-case.

pagreene avatar Feb 21 '18 14:02 pagreene

@pagreene: I'm glad you found a solution that's working for you. If I understand it properly, it sounds like the approach suggested by @michi88 above ("We could also choose to keep the current behaviour and add that we also match on the full path.")

That's not backwards compatible, though. Right now, if someone has static/ in their excludes, nothing will be excluded. When they download this new version of Zappa, suddenly that directory will be missing and the behavior of their program will change. Of course, you could argue that they shouldn't have had that in their excludes in the first place, since it doesn't do anything, but since the current behavior is completely mis-documented (as you noted), that's a bit harsh. I wouldn't be surprised if that particular mistake was reasonably common (as in the original post above).

That's why my preference is to accept the need for a new setting and implement something familiar like wildmatch rather than come up with an ad hoc solution. That said, if the powers that be decide that this degree of backwards incompatibility is close enough to just change the behavior of the existing setting, then I agree that that would be the simplest (if not the best) solution.

marfire avatar Feb 21 '18 22:02 marfire

I agree with that gitignore style is the ideal way to go. But, if it takes a long time to implement it or reach a consensus, How about adding an option like "exclude_match_fullpath": false? If the option is false, it just works as before and otherwise, match the glob pattern against fullpaths. it's easy to fix the problem quickly and, with making it false by default, we also can keep backward compatibility.

yeonghoey avatar Mar 04 '18 15:03 yeonghoey

This really needs to be fixed

Suor avatar Oct 19 '18 13:10 Suor

Something has to be done about it. The current behaviour of exclude is destructive, because it applies to all files regardless of depth. I want to exclude a directory from project root called model. If I add it to exclude list it will exclude ALL directories and files named model from all directories and all libraries so I have no idea what it may break.

Ironically authors of zappa fell victims of the same problem: https://github.com/Miserlou/Zappa/blob/master/zappa/core.py#L202

ZIP_EXCLUDES = [
    '*.exe', '*.DS_Store', '*.Python', '*.git', '.git/*', '*.zip', '*.tar.gz',
    '*.hg', 'pip', 'docutils*', 'setuputils*', '__pycache__/*'
]

Note the __pycache__/* - it does nothing because exclusion doesn't work on path names.

asyschikov avatar Feb 28 '19 22:02 asyschikov

I ended up implementing this hack around using the zip callback which runs a python function after the deployment package is created and cleans up unwanted files that match any regex you provide.

ian-whitestone avatar Jan 12 '20 18:01 ian-whitestone

Used this myself, one small issue is that you likely need to add the new excludes item to all stages, even if you used the extends feature, since this script just accesses the dict directly. Otherwise, works well!

chris-erickson avatar Oct 12 '20 15:10 chris-erickson

It is now 2022. "We should fix this" is still true. onelogin has a subdirectory a couple levels in called 'boto3', so I can ship a redundant whole copy of boto3, or never use xmlsec.

jjobermark avatar Apr 29 '22 15:04 jjobermark

Perhaps someone should point out how zip callbacks work, and that they should be used to exclude files until this functionality is fixed. An article: https://ianwhitestone.work/zappa-zip-callbacks/

jjobermark avatar Apr 29 '22 16:04 jjobermark

@jjobermark this repository is no longer maintained. I'd recommend another solution for deploying, such as AWS CDK.

hortonew avatar Apr 29 '22 16:04 hortonew

Thanks, I will look.

jjobermark avatar Apr 29 '22 16:04 jjobermark