pyre-check icon indicating copy to clipboard operation
pyre-check copied to clipboard

The "django_sources_sinks.pysa" file updated for Django 3.2 LTS released on April, 2021

Open maximmasiutin opened this issue 3 years ago • 33 comments

I have updated the "django_sources_sinks.pysa" file to support latest Django 3.2 LTS, released on April, 2021, which is installed automatically by "pip install django".

maximmasiutin avatar Apr 19 '21 16:04 maximmasiutin

@onionymous has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot avatar Apr 19 '21 21:04 facebook-github-bot

Hello @maximmasiutin! Thank you very much for the PR. Would you mind adding a test plan describing how you tested your changes and verified that they worked?

onionymous avatar Apr 19 '21 21:04 onionymous

@maximmasiutin has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 19 '21 22:04 facebook-github-bot

@onionymous -- I have just used "pip update django" to get the most recent version (3.2). I also copied the file django_sources_sinks.pysa from Pyre to a directory with all the .pysa files of my project, and the copied the directory "django" with the .pyi files from Pyre to a directory with all the typeshed stub directories (/home/max/.local/lib/pyre_check/typeshed/third_party/2and3) - and the parameter "typeshed" in the ".pyre_configuration" pointed to this directory.

I then ran "pyre analyze" and it gave me verification errors, because the existing django_sources_sinks.pysa file was for an earlier version of the Django library. I modified the file django_sources_sinks.pysa and run "pyre analyze" again. It still gave me errors like that:

taint/django_sources_sinks.pysa:11:0 `django.http.request.HttpRequest.headers` is not part of the environment!
taint/django_sources_sinks.pysa:32:0 Model signature parameters for `django.http.response.HttpResponseBase.__init__` do not match implementation `def __init__(self: HttpResponseBase, content_type: unknown = ..., status: unknown = ..., reason: unknown = ..., charset: unknown = ...) -> Any: ...`. Reason: unexpected named parameter: `headers`.
taint/django_sources_sinks.pysa:42:0 Model signature parameters for `django.http.response.HttpResponseBase.setdefault` do not match implementation `def setdefault(self: HttpResponseBase, header: str, value: Union[int, str]) -> None: ...`. Reason: unexpected named parameter: `key`.
taint/django_sources_sinks.pysa:45:0 Class `django.http.response.HttpResponseBase` has no attribute `headers`.
taint/django_sources_sinks.pysa:46:0 Class `django.http.response.HttpResponseBase` has no attribute `headers`.
taint/django_sources_sinks.pysa:62:0 `django.http.response.HttpResponseRedirectBase.__init__` is not part of the environment!
taint/django_sources_sinks.pysa:78:0 `django.db.models.query.QuerySet.raw` is not part of the environment!
taint/django_sources_sinks.pysa:118:0 `django.db.models.query.QuerySet.get` is not part of the environment!

I then updated the .pyi files for Django, and after that "pyre analyze" ran without errors.

So the test was on verification errors caused by changes in Django interface in version 3.2. After my updates, there were no more verification errors.

maximmasiutin avatar Apr 19 '21 22:04 maximmasiutin

@onionymous I did manage, however, for "pyra analyze" to find any SQL injections or XSS vulnerabilities in real Django applications after my modifications. Before my modifications, "pyra analyze" didn't run due to errors mentioned in my previous comments. After the modifications, it simply run and completed without any error, but didn't find any vulnerability in deliberately vulnerable applications.

I have been able to find vulnerabilities with Pyre in simple programs like https://github.com/fportantier/vulpy that have straightforward execution flow, but not yet with Django projects like the following:

https://github.com/lchsk/django-insecure/ https://github.com/hvnsweeting/django-insecure https://github.com/andresriancho/django-moth https://github.com/anxolerd/dvpwa https://github.com/nVisium/django.nV https://github.com/andresriancho/django-moth https://github.com/red-and-black/DjangoGoat https://github.com/Contrast-Security-OSS/DjanGoat

If you share me the hints on how to debug my pyre configuration to find out why it doesn't find any vulnerability in the above applications, I'd be grateful.

For instance, I took https://github.com/lchsk/django-insecure/

My "taint_models_path" folder contains the following files, all are standard which came from Pyre (except django_sources_sinks.pysa had my modifications I have pull-requested):

collection_propagation.pysa
django_sources_sinks.pysa
filesystem_other_sinks.pysa
filesystem_sinks.pysa
flask_sources_sinks.pysa
format_string_sinks.pysa
general.pysa
http_server.pysa
logging_sinks.pysa
protocols.pysa
rce_sinks.pysa
requests_api_sinks.pysa
sanitizers.pysa
skipped_overrides.pysa
sqlite3_sinks.pysa
taint.config
wsgi_ref.pysa
xss_sinks.pysa

The file .pyre_configuration containted the following lines:

{ "source_directories": ["/home/max/django-insecure/insecure/insecure", "/home/max/django-insecure/insecure/security", "/home/max/django-insecure/insecure/"], "taint_models_path": "/home/max/django-insecure/insecure/taint", "search_path": "/home/max/.local/lib/python3.8/site-packages", "typeshed": "/home/max/.local/lib/pyre_check/typeshed" }

The typeshed folder contained the Django .pyi files.

Still, "pyre analyze" didn't find any vulnerability.

How can I debug this config?

maximmasiutin avatar Apr 19 '21 22:04 maximmasiutin

Hi @maximmasiutin. I also ran Pysa on the first project you linked, but initially found no results. My setup was similar to yours (copying django-stubs into the typeshed/2and3 folder).

There could be several factors at play here:

  1. The project itself doesn't have type information. If you go into insecure/security/views.py, note that all the view functions have no type information. So Pysa won't understand that the request in those functions are of type HttpRequest, etc.
  2. We don't have models for the view functions. The arguments passed to those functions like user_id and filename are user-controlled, but there aren't models telling Pysa this. We would have to write some models like:
def security.views.unsafe_users(user_id: TaintSource[UserControlled]): ...
def security.views.read_file(filename: TaintSource[UserControlled]): ...
def security.views.copy_file(filename: TaintSource[UserControlled]): ...

Usually for larger projects with many view functions we automate generating these models with a model generator (e.g. RESTApiSourceGenerator, or you can try using the Pysa DSL).

Once I made the above changes (adding the models and annotating all the request arguments as HttpRequest), I was able to detect 4 issues, including SQLi and RCE: image image

I looked through a few of the other projects you linked, and they seem to be missing type information too, so these factors might be relevant for those projects too.

Other causes of false negatives (perhaps not relevant here, but good to keep in mind): 3. We might just not have coverage for it. Pysa ships with a pretty basic set of rules/models, so in more complicated repos I expect that Pysa will do a much better job of catching issues once users add their own models and rules. 4. Pysa has some weird behaviors, such as with globals, which can cause taint flows being missed sometimes. (https://pyre-check.org/docs/pysa-false-negatives#globals) There's also always the potential of bugs - if you find a false negative that seems like it shouldn't be happening, feel free to submit a bug report/open an issue!

As previously mentioned, you can always follow the debugging steps in our development guide (https://pyre-check.org/docs/pysa-false-negatives), and use reveal_taint() and reveal_type() to get a bit more idea of what is actually happening. A more advanced technique is to examine the taint-output.json that is output by Pysa to see what models are being generated, and what issues are actually being surfaced. (Unfortunately this is a little hard to parse, I recommend pretty-printing it with jq).

Thanks for being interested in Pysa and good luck getting everything to work! Please don't hesitate to reach out if you run into more issues. :)

onionymous avatar Apr 20 '21 03:04 onionymous

My goal is to have a fully automated toolkit to test third-party Python web applications with a minimum of preliminary steps, preferably in one click. Since most Django applications don’t have the type defined for the HttpRequest object, we can make a scanner that makes the Django app suitable for further processing with Pyre. For example, this scanner may search for all the view functions, e.g., in the “urlpatterns” list of the urls.py, and then create the .pyi files or modify the .py files to include the type. This would address factor nr. 1 in your list. To address factor nr. 2, the scanner can mark all other arguments as “UserControlled”, as you have suggested.

Another goal for this scanner may be to create a .py file that explicitly calls all the view functions. The first argument of each call will be an object of the HttpRequest class, and the other arguments, if any, will be an URL taken from this HttpRequest object. Therefore, Pyre will be able to infer the types of all the view functions by these explicit calls. Do you have any thoughts or suggestions on a fully automated toolkit to test third-party Python web applications, at least those written Django, first?

maximmasiutin avatar Apr 20 '21 07:04 maximmasiutin

Thank you for your suggestions!

  1. I first made a calls.py file to call all the view functions explicitly. Pyre reveal_type correctly revealed the types only in the calls.py file itself, but not in the files where the called functions are implemented.
  2. Then wrote a “views.pyi” file and put it near the views.py, in the same folder, but Pyre thus skipped this directory altogether and didn’t scan.
  3. After that, I put the “views.pyi” with the “init.pyi” to an appropriate location where the .pyi for all the libraries are stored, but it didn’t help Pyre reveal_type to reveal the types.
  4. I then wanted to create type annotations in a .pysa file, but it allows to annotate taints, not types.
  5. Then I modified the views.py file and specified the types explicitly. This time Pyre did understand the types correctly and caught an “Unsafe deserialization” in the following code:
    token = base64.b64decode(request.COOKIES.get('silly_token', ''))
    user = pickle.loads(token)
  1. Then I wrote a .pysa file to define string parameters as a taint source, as you have suggested. This time it caught almost all the vulnerabilities!

The only one remained is the following:

    query = request.GET.get('query', '')
    response = HttpResponse(f"Query: {query}")

Will debug this it and let you know.

Is there a way for Pyre to read somehow .pyi files for the code that I analyze, not just for the third-party modules?

maximmasiutin avatar Apr 20 '21 09:04 maximmasiutin

@maximmasiutin has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 20 '21 10:04 facebook-github-bot

@maximmasiutin has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 20 '21 10:04 facebook-github-bot

I have debugged the following code:

    query = request.GET.get('query', '')
    response = HttpResponse(f"Query: {query}")

I have further updated the Django .pysa and .pyi files for compatibility with the latest version.

maximmasiutin avatar Apr 20 '21 10:04 maximmasiutin

The problem is in the following definition in the django_sources_sinks.pysa file:

def django.http.response.HttpResponse.__init__(
    self,
    content: TaintSink[ReturnedToUser],
    *args: TaintSink[ReturnedToUser],
    **kwargs: TaintSink[ReturnedToUser],
): ...

The taint sink "ReturnedToUser" usually means, according to the default "taint.config", that (quote) "Exception message is returned to the user" or (quote) "Server secrets such as keys and access tokens are being returned to users".

However, the use of the word "return" is misleading here. An exception message appeared on the server, not on the user, so it is not "returned" but displayed. So is with the server secrets - they do not originate on the user's side. So, the server secrets are "transmitted", not "returned".

According to the m-w.com dictionary, the word "return" means "to pass back to an earlier possessor", "to send back".

If we use the “ReturnedToUser” for normal HTTP responses, we may mislead the users. A better term would have been “SentToUser”.

Anyway, this “ReturnedToUser” taint sink is currently only defined for the two rules: "Server secrets reach exit" and "Exception message is returned to the user".

There should probably be another rule in the default “taint.config” file that the UserControlled data is sent back without being sanitized. It would have helped Pyre find the vulnerability in the following code:

  query = request.GET.get('query', '')
   response = HttpResponse(f"Query: {query}")

But it may also bring many false positives. However, these scripting attacks are not as dangerous as SQL injections if headers are configured correctly on a web server like nginx.

Do you have any ideas on how to make Pyre by default find such scripting attacks without bringing new false positives?

maximmasiutin avatar Apr 20 '21 10:04 maximmasiutin

Is there a way for Pyre to read somehow .pyi files for the code that I analyze, not just for the third-party modules?

Yes, see https://pyre-check.org/docs/types-in-python#when-source-code-is-not-available. If you have .pyi files and .pyi in your source directory, the type information from the stub files will take precedence, so you can just put the relevant .pyi stub files next to their corresponding .py source files.

onionymous avatar Apr 20 '21 16:04 onionymous

@onionymous has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot avatar Apr 20 '21 16:04 facebook-github-bot

@maximmasiutin has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 20 '21 17:04 facebook-github-bot

I have fixed some minor errors with "stubtest"

maximmasiutin avatar Apr 20 '21 17:04 maximmasiutin

Is there a way for Pyre to read somehow .pyi files for the code that I analyze, not just for the third-party modules?

Yes, see https://pyre-check.org/docs/types-in-python#when-source-code-is-not-available. If you have .pyi files and .pyi in your source directory, the type information from the stub files will take precedence, so you can just put the relevant .pyi stub files next to their corresponding .py source files.

I did this (put a .pyi file in the same directory as the .py file for a project that I test), but Pyre in this case skipped this directory entirely. I then removed the .pyi file and run Pyre again, and it checked that directory. Can you reproduce this behaviour? If not, how can I debug this and send you the logs?

maximmasiutin avatar Apr 20 '21 17:04 maximmasiutin

I did this (put a .pyi file in the same directory as the .py file for a project that I test), but Pyre in this case skipped this directory entirely. I then removed the .pyi file and run Pyre again, and it checked that directory. Can you reproduce this behaviour? If not, how can I debug this and send you the logs?

Ah, right - if the stub file exists, the corresponding .py file will not be analyzed. (https://www.python.org/dev/peps/pep-0484/#stub-files) There are some cases where you want to add stubs for files where you do have source code available, e.g. when you have generated code with complicated logic that is hard for the type checker, and want the type checker to assume a more sensible interface, but this isn't the case here. Sorry I misunderstood your earlier question.

If you just want to annotate the functions in views.py, just adding them in the file as follows (as you have done previously in 5.):

def unsafe_users(request: HttpRequest, user_id):
    ...

is the recommended approach.

onionymous avatar Apr 20 '21 18:04 onionymous

@maximmasiutin has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 20 '21 18:04 facebook-github-bot

@maximmasiutin has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 20 '21 19:04 facebook-github-bot

@maximmasiutin has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 20 '21 19:04 facebook-github-bot

@maximmasiutin has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 20 '21 21:04 facebook-github-bot

@maximmasiutin has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 20 '21 21:04 facebook-github-bot

The taint sink "ReturnedToUser" usually means, according to the default "taint.config", that (quote) "Exception message is returned to the user" or (quote) "Server secrets such as keys and access tokens are being returned to users".

However, the use of the word "return" is misleading here. An exception message appeared on the server, not on the user, so it is not "returned" but displayed. So is with the server secrets - they do not originate on the user's side. So, the server secrets are "transmitted", not "returned".

You are correct - this naming could probably be better, but internally we use this name in a lot of places with other tools right now. I can see how it is a little misleading, but unfortunately we probably won't be changing this right now, since for the sake of consistency, we'd have to change it everywhere.

There should probably be another rule in the default “taint.config” file that the UserControlled data is sent back without being sanitized.

This would be an interesting rule to have :) So the main idea here is that we want to detect raw strings from the user being returned in an HTML response, i.e. reflected XSS vulnerabilities. However,ReturnedToUser is probably not the right sink to use here, since we also use this sink type for API responses in e.g. JSON, GraphQL responses, and other things, not only HTML-rendered responses. Perhaps you can consider adding a new rule for user-controlled sources flowing into a new sink type, like HTMLResponse? Then our model might look something like HttpResponse(content: TaintSink[ReturnedToUser, HTMLResponse]).

But it may also bring many false positives. However, these scripting attacks are not as dangerous as SQL injections if headers are configured correctly on a web server like nginx.

Do you have any ideas on how to make Pyre by default find such scripting attacks without bringing new false positives?

Unfortunately false positives are one of the more difficult aspects about working with Pysa, and it will be hard to get a high signal rule 'by default'. One idea is that you could mark template rendering functions such as https://docs.djangoproject.com/en/3.1/ref/template-response/ as sanitizers, or add a feature annotation so you can later create a filter to exclude certain flows from your results.

onionymous avatar Apr 21 '21 01:04 onionymous

My goal is to have a fully automated toolkit to test third-party Python web applications with a minimum of preliminary steps, preferably in one click. Since most Django applications don’t have the type defined for the HttpRequest object, we can make a scanner that makes the Django app suitable for further processing with Pyre. For example, this scanner may search for all the view functions, e.g., in the “urlpatterns” list of the urls.py, and then create the .pyi files or modify the .py files to include the type. This would address factor nr. 1 in your list. To address factor nr. 2, the scanner can mark all other arguments as “UserControlled”, as you have suggested.

Another goal for this scanner may be to create a .py file that explicitly calls all the view functions. The first argument of each call will be an object of the HttpRequest class, and the other arguments, if any, will be an URL taken from this HttpRequest object. Therefore, Pyre will be able to infer the types of all the view functions by these explicit calls. Do you have any thoughts or suggestions on a fully automated toolkit to test third-party Python web applications, at least those written Django, first?

You could also try some of the suggestions in our documentation about improving coverage: https://pyre-check.org/docs/pysa-coverage/

A combination of pyre infer and some custom scripting (e.g. modify the .py files like you mentioned) seems like it might be a good start to achieving the automation you want?

onionymous avatar Apr 21 '21 03:04 onionymous

@maximmasiutin has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 21 '21 09:04 facebook-github-bot

@maximmasiutin has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 21 '21 09:04 facebook-github-bot

I've just made some updates for the Django stubs and it finally correctly reveals a type within the https://github.com/red-and-black/DjangoGoat (branch called "broken"):

authentication.views:60:12-60:23: Revealed type for django.contrib.auth.models.User.objects: django.db.models.manager.Manager
authentication.views:32:12-32:23: Revealed type for authentication.models.UserProfile.objects: django.db.models.manager.Manager

maximmasiutin avatar Apr 21 '21 09:04 maximmasiutin

I have created a pull request at https://github.com/typeddjango/django-stubs/pull/601

maximmasiutin avatar Apr 23 '21 09:04 maximmasiutin

The developers of typeddjango/django-stubs have approved the pull request https://github.com/typeddjango/django-stubs/pull/601 , but not yet https://github.com/typeddjango/django-stubs/pull/598

maximmasiutin avatar Apr 23 '21 14:04 maximmasiutin