pysolr icon indicating copy to clipboard operation
pysolr copied to clipboard

Code Injection

Open 0x96e63 opened this issue 2 years ago • 1 comments

Is there any way to prevent code injection when using the search function?

I was wondering even if any malicious code can be injected to modify the data set.

Thanks.

0x96e63 avatar May 20 '23 18:05 0x96e63

I don't think so. If solr has code injection vulnerability, people can still send maliciously constructed requests to exploit that via curl etc, and pysolr cannot help with that.

ch2ohch2oh avatar Oct 10 '23 23:10 ch2ohch2oh

Releases before Jan 2021 have an injection vulnerability due to not correctly escaping its parameters:

  • https://github.com/django-haystack/pysolr/issues/357

self.solr.delete(q='id:*</query><query> id:999 AND id:9999')

Should not delete all documents

rmayer-sst avatar May 09 '24 16:05 rmayer-sst

This a somewhat complicated topic which pysolr can't easily help with. The problem is that the classic Solr syntax has a variety of features and pysolr doesn't know what context you're escaping things in, how you have configured Solr and which query parser you're using, and what features you want to expose to your users (e.g. do you want to support boolean searches like "apples -bananas" by allowing the user to enter that minus sign directly or does your application have a higher-level interface to express that concept and you'd want to have any hyphens in user-entered data be treated as literal values? Do you let users control whether quotes are used to group words into phrases, etc.?). You really have to decide what is allowed for your public interface and validate that at input rather than trying to escape that on the backend.

django-haystack has a simple clean() method but it's important to remember that Haystack operates in a context where it's using a Django ORM-style interface for all of the complicated features so complex queries are generally constructed using chained methods which do not expose the Solr syntax directly and there's no question about the distinction between the query syntax and the values – e.g. in sqs.filter(title=x).exclude(subject=y) you know you can fully escape x and y to be valid values on the right side of Solr's field:value syntax.

In a different project, I use this code with a customization to support users being able to use quotes to search for phrases but it has a check to simply reject unpaired quotes because we don't have a need to support someone searching for literal quote characters:

# Source: https://solr.apache.org/guide/8_11/the-standard-query-parser.html#escaping-special-characters
# This is modified to allow the use of quotes for phrases with a check that
# they're paired (see escape).
SOLR_ESCAPE_RE = re.compile(
    r"""
    (
        [&]{2}|
        [|]{2}|
        [\\\+!(){}[\]^~:/]|
        \b-
    )
    """,
    flags=re.VERBOSE,
)


def escape(search_value: str) -> str:
    """
    Escape a user-provided value suitably for use as a Solr query term's value
    """

    # Matching unpaired quotes without a regex engine which supports variable
    # width negative lookbehind expressions and since we don't have any reason
    # to support that in normal usage we'll simply confirm that quotes are
    # paired:
    if search_value.count('"') % 2 != 0:
        raise ValueError("Unpaired quotes are not allowed")

    search_value = re.sub(r"\s+", " ", search_value)

    return SOLR_ESCAPE_RE.sub(r"\\\1", search_value)

acdha avatar May 09 '24 17:05 acdha