torn-pda icon indicating copy to clipboard operation
torn-pda copied to clipboard

@match handling doesn't follow script standard

Open xentac opened this issue 8 months ago • 3 comments

According to Tampermonkey you must include a protocol when defining a @match meta (https://www.tampermonkey.net/documentation.php?locale=en#meta:match).

The current code (https://github.com/Manuito83/torn-pda/blob/5788c5c9e5677e4ea9ea58db7f5bb0666162baad/lib/models/userscript_model.dart#L211) doesn't handle globs very well either. If you define a * anywhere in the middle of a URL, it will not match any URL you actually want it to.

I asked chatgpt to implement a @match handler and it suggested code like this:

function matchPattern(pattern, url) {
  // Escape regex special characters except * (we'll handle * ourselves)
  function escapeRegex(str) {
    return str.replace(/[$^+.?()[\]{}|\\]/g, '\\$&');
  }

  // Convert @match pattern to regex
  function patternToRegex(pattern) {
    const match = pattern.match(/^(https?|file|\*):\/\/([^\/]*)\/(.*)$/);
    if (!match) throw new Error('Invalid @match pattern: ' + pattern);

    const [, scheme, host, path] = match;

    // Scheme
    let schemeRegex = scheme === '*' ? 'https?' : escapeRegex(scheme);

    // Host
    let hostRegex = host
      .replace(/\./g, '\\.')
      .replace(/^\*\./, '(?:[^/]+\\.)?') // *.example.com => optional subdomain
      .replace(/\*/g, '[^/]*'); // wildcard in host (unusual)

    // Path
    let pathRegex = escapeRegex(path).replace(/\*/g, '.*');

    return new RegExp(`^${schemeRegex}://${hostRegex}/${pathRegex}$`);
  }

  const regex = patternToRegex(pattern);
  return regex.test(url);
}

Whether or not you want to break scripts that don't include the protocol in torn pda is up to you, but you could modify the final RegExp to make the scheme optional.

xentac avatar Apr 13 '25 22:04 xentac

We should be able to parse the Uri even with globs and with or without a schema.

final g = Uri.parse('*.google.com/*');
print('hello ${g}'); // hello *.google.com/*

Then you can match on the individual components already split out for you and don't need to regex it. (And the internal implementation isn't a regex, I checked: https://api.flutter.dev/flutter/dart-core/Uri/parse.html)

Doesn't fix that you don't have glob support now, but that's a bit easier once you're deal with components (host/path). Also, there's a library for glob matching... https://pub.dev/packages/glob.

TravisTheTechie avatar Apr 13 '25 22:04 TravisTheTechie

Thanks both. This has been on the todo list for a while, but I haven't had the chance to implement anything yet.

I was probably planning on porting the violentmonkey code across - from memory they use regex as well, however I'm not 100% sure and won't be fishing through the codebase on my phone...

@TravisTheTechie does the Glob lib work with URLs? It's not a bad starting place however bash does handle globs differently to web extension matches... I'll have a look later at how difficult it will be.

note: @xentac whilst it's not difficult to port code across; for future reference, the app's code is in Flutter (Dart) rather than JavaScript

Kwack-Kwack avatar Apr 14 '25 00:04 Kwack-Kwack

To fix the issue with backwards compatibility, we need to update the URL matching logic in lib/models/userscript_model.dart to properly handle @match patterns as regex, similar to Tampermonkey/Greasemonkey standards. This includes support for protocols, subdomains (*.example.com), wildcards in host/path, and edge cases like missing protocols or trailing slashes in patterns (to avoid breaking existing scripts that may not follow strict standards).

No changes are needed to other files, as this is isolated to the matching logic in the userscript model.

Changes to lib/models/userscript_model.dart

  1. Add the following helper functions at the top of the file (outside any class, as top-level functions for simplicity):
String escapeRegex(String str) {
  return str.replaceAllMapped(RegExp(r'[$^+.?()[\]{}|\\]'), (Match m) => '\\${m[0]}');
}

RegExp patternToRegex(String pattern) {
  var regexMatch = RegExp(r'^(https?|file|\*):\/\/([^/]*)/(.*)$').firstMatch(pattern);
  if (regexMatch == null) {
    // Handle missing path (e.g., "http://example.com" -> "http://example.com/*")
    var noPathRegex = RegExp(r'^(https?|file|\*):\/\/([^/]*)$');
    if (noPathRegex.hasMatch(pattern)) {
      pattern += '/*';
      regexMatch = RegExp(r'^(https?|file|\*):\/\/([^/]*)/(.*)$').firstMatch(pattern);
    }
  }
  if (regexMatch == null) {
    // Prepend "*://" for backwards compatibility (e.g., "example.com/*" -> "*://example.com/*")
    var adjustedPattern = '*://$pattern';
    regexMatch = RegExp(r'^(https?|file|\*):\/\/([^/]*)/(.*)$').firstMatch(adjustedPattern);
    if (regexMatch == null) {
      // Handle missing path in adjusted pattern
      var noPathRegex = RegExp(r'^(https?|file|\*):\/\/([^/]*)$');
      if (noPathRegex.hasMatch(adjustedPattern)) {
        adjustedPattern += '/*';
        regexMatch = RegExp(r'^(https?|file|\*):\/\/([^/]*)/(.*)$').firstMatch(adjustedPattern);
      }
    }
    if (regexMatch == null) {
      throw Exception('Invalid @match pattern: $pattern');
    }
  }

  final scheme = regexMatch.group(1)!;
  final host = regexMatch.group(2)!;
  final path = regexMatch.group(3)!;

  String schemeRegex = scheme == '*' ? 'https?' : escapeRegex(scheme);
  String hostRegex = host
      .replaceAll('.', r'\.')
      .replaceFirst(RegExp(r'^\*\.'), r'(?:[^/]+\.)?')
      .replaceAll('*', r'[^/]*');
  String pathRegex = escapeRegex(path).replaceAll('*', '.*');

  return RegExp('^$schemeRegex://$hostRegex/$pathRegex\$');
}

bool matchPattern(String pattern, String url) {
  if (pattern == '*') {
    return true; // Backwards compatibility for "*" matching everything
  }
  try {
    final regex = patternToRegex(pattern);
    return regex.hasMatch(url);
  } catch (e) {
    return false; // Invalid patterns silently fail to match (avoids breaking app)
  }
}
  1. Replace the existing matching logic (around line 211, based on commit 5788c5c):

    Original:

    matches.any((match) => (match == "*" || url.contains(match.replaceAll("*", ""))));
    

    New:

    matches.any((match) => matchPattern(match, url));
    

    Note: This assumes the line is part of a method like bool shouldRun(String url) or similar in the UserScript class. If the full method includes additional logic (e.g., if (matches.isEmpty) return true;), leave that unchanged—the change is only to the any condition.

Explanation of Changes

  • Backwards Compatibility:
    • If a pattern lacks a protocol (e.g., "www.example.com/"), we prepend "://" (matching http/https).
    • If a pattern lacks a path after the host (e.g., "http://example.com"), we append "/*" to match subpaths.
    • Simple "*" continues to match all URLs, as in the original code.
    • Invalid patterns fail silently (return false) to avoid crashing on old scripts.
  • Improved Handling: Matches Tampermonkey-style patterns accurately, including subdomain wildcards (.example.com), path wildcards (/), and rare host wildcards (www.*.com).
  • No Breaking Changes: Existing crude matching (e.g., partial contains after removing "*") is replaced, but the new logic covers equivalent cases more robustly without requiring script updates.
  • Testing Recommendation: After applying, test with example patterns like:
    • "*": Should match any URL.
    • "http://example.com/*": Matches http://example.com/anything.
    • "://.example.com/foo/*": Matches https://sub.example.com/foo/bar.
    • "example.com/bar": Auto-becomes "*://example.com/bar", matches http/https://example.com/bar.

Apply these changes via a pull request to the repo for review.

alxspiker avatar Jul 16 '25 23:07 alxspiker