Zappa
Zappa copied to clipboard
Directories not excludable via zappa_settings.json
Context
Directories not ignored as expected. The zappa_settings.json looks like this. Note that I am using .gitignore notation for ignoring the directories.
https://github.com/nueverest/vue_flask/blob/master/zappa_settings.json
{
"production": {
"s3_bucket": "nueverest",
"aws_region": "us-west-2",
"app_function": "vue_flask.app",
"delete_local_zip": false,
"exclude": ["deployed/", "static/", "unit_tests/", "*.css", "*.js", "*.json", "*.csv", "*.ico"]
}
}
Expected Behavior
I am expecting the folders named: deployed, static, and unit_tests to be excluded from the generated zip file.
Actual Behavior
When I open the generated zip file I see the folders that I thought were excluded. This increases the time it takes to deploy the app.
Possible Fix
Allow directory patterns to be excluded via zappa_settings, or document how it is currently implemented in the readme.
Your Environment
- Zappa version used: 0.37.1
- Operating System and Python version: Windows and Py 2.7.13
- Link to your project (optional): https://github.com/nueverest/vue_flask
- Temporary link to deployed site: https://kw4udfbos9.execute-api.us-west-2.amazonaws.com/production
I agree we should change the implementation of this. What might work for you is use static
without the trailing slash. Be aware that this will exclude all folders that are called static. No matter where they are in the folder tree. What you can't do for example is myapp/static
, we should fix that.
@michi88 Doing it without the /
fixed it for me. Excluding all folders with the same name does reduce granular control. I will structure my project accordingly.
Thinking about how to implement this. We could change the behavior of those patterns in exclude
so that they are removed from the temporary zip after we create it rather than the current way we do it now, which excludes them as we iterate over the venv/project contents. Could be more simple.
We're using copytree(site_packages, temp_package_path, symlinks=False, ignore=shutil.ignore_patterns(*excludes))
now.
The problem with ignore=shutil.ignore_patterns
is that it matches the current file or directory names on iteration and not the full path. This is why 'static' works and 'static/' or 'myapp/static' not. Also something like 'myapp/settings/settings_local.py' doesn't work, 'settings_local.py' does.
Note that if we change this and do for example what you describe @Miserlou + matching the full path by walking the zip ourselves we will break stuff for people as 'static' would not match anymore for example. We could also choose to keep the current behaviour and add that we also match on the full path. That might work. Thinking about it that is actually how gitignore behaves also.
One way to make 'static/' work the same way as 'static' with the current implementation is to simply remove the trailing slash before processing i.e. 'static/' --> 'static'.
This does not cover the 'myapp/static' case, but it is a step in the right direction.
@michi88 i think it would be great to add an exclude_path option, just ran into some situations where I'd like to use the full path so i can include some things and not others that have the same filename but different path.
Honestly if wasn't a backwards compatibility issue I'd be inclined to suggest replacing the old exclude as exclude_path is much more useful/powerful and I can't see a reason to use the old exclude if exclude_path existed.
Specifically I have a folder staticfiles I want to exclude which ends up keeping the staticfiles app out of my project completely.
I think the way forward with this is to officially adopt the gitignore (wildmatch) syntax. It's well-known and widely used and—unlike the current glob-style syntax—powerful enough to meet everyone's needs.
Unfortunately it's not backwards-compatible with the current exclude
, so I think we either need to accept the backwards-incompatibility or create a new setting and deprecate the old one.
In terms of implementation, it should be straightforward to do using the pathspec package. That allows you to turn the setting into a spec object and do, e.g., spec.match_tree('path/to/directory')
.
@Miserlou: I can provide a PR for this if we can reach consensus.
The AWS X-ray SDK includes a subdirectory named 'botocore'. With the current exclude implementation my only choice to get instrumentation of boto calls to work, is to specify a custom exclude value that doesn't include 'botocore' -- and thus results in the botocore package being deployed.
There are also plenty of test/tests/testing subdirectory variations that are hard to target without being able to specify more parts of the path. Again, excluding all directories named test/tests/testing would be too broad.
@Miserlou @marfire I'm currently blocked on this, so I'm forking and implementing a workaround that will get me going. If there's consensus on the way forward and @marfire hasn't gotten around to working on this, I'd be happy to submit a PR.
This is blocking for me as well. I have two applications in a single folder and I'd like to be able to share common libraries for instances.
myapp1
myapp2
common
zappa_myapp1_settings.json
zappa_myapp2_settings.json
The issue with just excluding myapp1
is that it's recursive and you might omit directors that are in the venv which I learned the hard when my application was deployed and missing dependencies.
Has someone requested a pull request on this? I agree going with the gitignore convention would be most beneficial.
@kim-10e: I haven't worked on it, so feel free to take it on. Were you thinking of implementing the approach I suggested above, or something else?
I think the approach I landed on would actually be backwards compatible, yet allow more flexibility.
The current implementation traverses the input directories recursively and applies the exclude patterns to the entries of each directory. That means e.g. botocore
ends up not only excluding the botocore package, but also any module directory within a package named botocore
.
The patch I'm implementing changes the filter mechanism to consider the full path name, changes the pattern matching to respect directory boundaries, and introduces **
as a way to match across directory boundaries.
Examples:
botocore
still excludes any file or directory with the name botocore any where in the input directory structure. No change from current behavior.
*.pyc
matches any file entry ending with .pyc. No change from current behavior.
site-packages/botocore
only matches botocore
entries in a parent directory named site-packages. This dramatically reduces the risk of false positives without having to know the full absolute path (which varies based on the virtual environment location).
site-packages/dlib/**/*.so
matches any .so files in the dlib package
I'm happy to discuss and adjust this approach before submitting a PR.
The problem I see with your approach is that you're inventing yet another new pattern-matching language that is similar to, but subtly different from, the existing ones (glob, fnmatch, wildmatch). I worry that that will be a source of confusion and bugs for users. And the implementation itself will add to the maintenance burden of the project for something that is quite outside the core purpose of Zappa.
That's why I prefer the wildmatch solution I described above. That uses a well-known, standard syntax that many projects (including Zappa) are already using, and outsources the implementation to an external library.
It's also equally backwards compatible, albeit at the cost of adding a new setting. That's definitely a wart, but it's one that can be ultimately be fixed through a deprecation policy.
Perhaps this is a silly question, but why do the exclude entries not simply use regex applied to the file paths, as advertised ("A list of regex patterns to exclude from the archive.")?
@pagreene: I haven't looked at the commit history, but I'm assuming that's just a documentation bug. As far I know the project has never used regular expressions for exclude
, and changing to that would have the same backwards compatibility concerns raised above.
And although regular expressions are certainly powerful enough for the job, I expect they would be more awkward to work with than languages designed specifically for specifying file paths (like wildmatch).
@marfire: That makes sense. I contemplated making regex some kind of optional method for checking patterns, but decided that was too complicated/too much work.
Instead, I created a fix on branch in my fork that simply uses glob to check matches to either the file name alone or the complete path. This suits my purposes, and it seems a simple enough solution that guarantees backwards compatibility and a fairly intuitive behavior. It's not the prettiest solution, being combinatorically problematic for a large number of patterns and files, but that won't be an issue for me, as this is a fairly light-weight use-case.
@pagreene: I'm glad you found a solution that's working for you. If I understand it properly, it sounds like the approach suggested by @michi88 above ("We could also choose to keep the current behaviour and add that we also match on the full path.")
That's not backwards compatible, though. Right now, if someone has static/
in their excludes
, nothing will be excluded. When they download this new version of Zappa, suddenly that directory will be missing and the behavior of their program will change. Of course, you could argue that they shouldn't have had that in their excludes
in the first place, since it doesn't do anything, but since the current behavior is completely mis-documented (as you noted), that's a bit harsh. I wouldn't be surprised if that particular mistake was reasonably common (as in the original post above).
That's why my preference is to accept the need for a new setting and implement something familiar like wildmatch rather than come up with an ad hoc solution. That said, if the powers that be decide that this degree of backwards incompatibility is close enough to just change the behavior of the existing setting, then I agree that that would be the simplest (if not the best) solution.
I agree with that gitignore
style is the ideal way to go. But, if it takes a long time to implement it or reach a consensus, How about adding an option like "exclude_match_fullpath": false
?
If the option is false
, it just works as before and otherwise, match the glob pattern against fullpaths.
it's easy to fix the problem quickly and, with making it false
by default, we also can keep backward compatibility.
This really needs to be fixed
Something has to be done about it. The current behaviour of exclude
is destructive, because it applies to all files regardless of depth. I want to exclude a directory from project root called model
. If I add it to exclude
list it will exclude ALL directories and files named model
from all directories and all libraries so I have no idea what it may break.
Ironically authors of zappa fell victims of the same problem: https://github.com/Miserlou/Zappa/blob/master/zappa/core.py#L202
ZIP_EXCLUDES = [
'*.exe', '*.DS_Store', '*.Python', '*.git', '.git/*', '*.zip', '*.tar.gz',
'*.hg', 'pip', 'docutils*', 'setuputils*', '__pycache__/*'
]
Note the __pycache__/*
- it does nothing because exclusion doesn't work on path names.
I ended up implementing this hack around using the zip callback which runs a python function after the deployment package is created and cleans up unwanted files that match any regex you provide.
Used this myself, one small issue is that you likely need to add the new excludes item to all stages, even if you used the extends feature, since this script just accesses the dict directly. Otherwise, works well!
It is now 2022. "We should fix this" is still true. onelogin has a subdirectory a couple levels in called 'boto3', so I can ship a redundant whole copy of boto3, or never use xmlsec.
Perhaps someone should point out how zip callbacks work, and that they should be used to exclude files until this functionality is fixed. An article: https://ianwhitestone.work/zappa-zip-callbacks/
@jjobermark this repository is no longer maintained. I'd recommend another solution for deploying, such as AWS CDK.
Thanks, I will look.