revanced-patcher icon indicating copy to clipboard operation
revanced-patcher copied to clipboard

feat: Modernize APIs

Open oSumAtrIX opened this issue 1 month ago • 34 comments

About

This PR gets rid of the fingerprint system and replaces it with a simpler API to find methods and classes. It offers a DSL API for that purpose. On top it simplifies internal code handling mutable classes/proxies & adds new apis for creating patches with their name obtained from the property to reduce duplicating the name in the property and name parameter.

What to expect:

image

May supersede https://github.com/ReVanced/revanced-patcher/pull/329 or be merged with it.

Todo

  • Modernize more places of the patcher, e.g, the patcher apis.
  • Finish code
  • Implement tests
  • Implement caching (how?)
  • IMPORTANT: Right now, the matcher api is built in a way where it will be instantiated everytime the lambdas are called which is inefficient. The matchers need to be built once.

oSumAtrIX avatar Nov 18 '25 23:11 oSumAtrIX

Implemented caching elegantly:

image

The code speaks semantically. We look for a method that unconditionally has the required strings/types. (it can be extended if we add more caching stuff). And afterwards it applies the predicate to find the desired target. This is precisely what the code also internally does:

image

Internally, the caching code is modernized:

image

Not sure if this is good but keeping as a precaution:

image image

oSumAtrIX avatar Nov 19 '25 09:11 oSumAtrIX

This is particularly useful and very powerful in dynamic scenarios such as this:

image

You can utilize the power of the cache and do additional arbitrary filtering for partial strings

oSumAtrIX avatar Nov 19 '25 09:11 oSumAtrIX

A small optimization can be done which is late initializing the cache until a fingerprint needs it. Right now, the cache is lazy but the mergeExtensions function accesses it prematurely leading to initialization of the cache, even when no fingerprint werre to utilize it. Postponing the population of the cache by extension classes would accomplish this optimization

oSumAtrIX avatar Nov 19 '25 09:11 oSumAtrIX

Fingerprints should be declarative. They should not be a giant lambda of logic (with more embedded lambdas).

The prior fingerprints (and the changes in my PR) have structure and are equivalent to filling out a questionnaire form. You answer the questions in the form of parameters (With no code logic!) to describe the method and describe the indexes of of the method you want, and you get the method and the indices. All the logic is in classes, and not scattered across each fingeprint.

Using lambdas for everything is taking the fingerprint custom block (which was a kludge to satisfy situations that pure opcode filtering could not solve), and expanding that custom block out into the entire fingerprint. The structure is whatever anyone wants, at the expense of it now being a big block of code where the tools to search are dumped onto the table and the developer has to come up with a way to shoestring it all together. The potential to create messy convoluted fingerprints is limitless. The changes to my PR any messy convolution must be contained in custom InstructionFilter, but I'm going to stake that nobody is going to do that because InstructionFilter handles all use cases including if someone wants to dream up silly and unneeded ideas ("match a number literal if it's odd". "match a single instruction if it has an odd number of vowels.", or whatever weird use case nobody will use in practice).

I asked for a comment from an unnamed developer that writes some non ReVanced patches for non YouTube apps, and their comments were:

...I checked the lambda pull request in the revanced-patcher repository.

This change request is a truly stupid refactoring.

It might handle the rare exceptions that might occur in the 0.01% chance, but the code quality becomes absolutely terrible.

I cannot understand why this approach must be forced on all classes instead of being optional.

I am not alone in thinking what is here is not the right solution.

LisoUseInAIKyrios avatar Nov 19 '25 09:11 LisoUseInAIKyrios

They should not be a giant lambda of logic (with more embedded lambdas).

You dont have to use lamdbas. You can access fields directly via . or write an extension function to short it like I have shown above:

image

Fingerprints should be declarative.

No motivation provided. The API is already declarative, it is just expressions you can chain with operators: You are thus able to express things like name and anyInstruction or field for example.

In a FULL declarative design you would have to allow this dynamic with declarative operators such as and/or:

scopeStart() name("") and() anyInstruction("") scopeEnd() or()

or an alternative:

or(and(name(""), anyInstruction("")))

You initially claimed "fingerprints should be dynamic". Here you go (this btw is called the polish notation). How is that any better? Does the world use polish notation? No. They use the notation we see in the lambda A + B and not +A,B. And no "just remove the chains we dont need them" is not correct. You initially said fingerprints should be declarative and not "fingerprints should not have and/or operators". Both are not mutually exclusive, i can write declarative + operators, and non declarative + operators, just declarative and just non declarative without operators. Again, your statement is arbitrary and as proven here, inferior. Declarative makes the same feature set worse as you see above. It is also not better if you remove the operatos because then you remove features that are necessary to express everything fully.

That said, fingerprints should not be purely declarative. Feel free to counterargue, and not just state "fingerprints should be declarative" without providing a logical chain like i did here for my counterargument.

The prior fingerprints (and the changes in my PR) have structure and are equivalent to filling out a questionnaire form. You answer the questions in the form of parameters (With no code logic!) to describe the method and describe the indexes of of the method you want, and you get the method and the indices. All the logic is in classes, and not scattered across each fingeprint.

You are mentally stuck to a model similar to a tunnel vision. Let me explain. You want something akin to a form. Now how is that not akin that:

form { A() && b() && c() && d() }

Note, interpret && as just syntax sugar (it can syntactically be converted to declarative statements ending with ";")

Now translate this form to:

findMethod { name == "a" && anyFields { } && etc }

I will ask you again, how is this not a questionaire form. That said, my API fulfils your complete set. Now in addition to that, you are freely able to extend this feature set. Note, its not mandatory, its freely optional. You dont have to, you can:

findMethod { name.endsWith("") || anyFields { } && (a() || b()) }

See how it can do what you want, and specifically more? See how it can do everything you can, but more, and not just more, it can do everything while just && cant?

Note, in boolean arithmetic, any operation can be translated in two operators NOT, AND:

a || b == !(a && b)

That said, every operation on my api can be described with NOT, and AND. Boolish arithmetic requires 2 operators at MIN to describe any boolish state. Dont have those? Then you cant describe a full feature set competely. Now, you could add NOT to your declarative API, since you already have AND, which means your feature set would become as rich as mine (ignoring atomic operations like name.endWith), but at this point. Why not have OR to make things simpler? You dont have to describe an OR operation as !(a && b). That said, no, restricting the dynamics of the API is not beneficial.

Using lambdas for everything is taking the fingerprint custom block (which was a kludge to satisfy situations that pure opcode filtering could not solve), and expanding that custom block out into the entire fingerprint. The structure is whatever anyone wants, at the expense of it now being a big block of code where the tools to search are dumped onto the table and the developer has to come up with a way to shoestring it all together.

No, the API is complete. You arent restricting the developer into a specific way of how they want to use the code. You allow the developer to write the code however you want. In this case, the developer can choose:

findMethod { a && b } or findMethod { b && a } freely. Similarly they can in your code:

findMethod( a(); b(); } or findMethod { b(); a(); }

The difference is that a and b in my case are more granular than in your case. I dont just restrict the dev to a(), they can do everything with a. E.g name.endsWith. Everything else is exactly like in your API. That said, once again, my api is yours, but more.

The potential to create messy convoluted fingerprints is limitless

And so is it in yours. You offer a custom block, completely breaking out of your api, endless messy convoluted fingerprints. See how you just said "the potential" and this applies to you? "But I provide a conscice API as an alternative". And so do I, you can always write a conscice wrapper function over them however you please, to an extent where it would be as identical and limiting as to yours.

The changes to my PR any messy convolution must be contained in custom InstructionFilter, but I'm going to stake that nobody is going to do that because InstructionFilter handles all use cases including if someone wants to dream up silly and unneeded ideas ("match a number literal if it's odd". "match a single instruction if it has an odd number of vowels.", or whatever weird use case nobody will use in practice).

You said, "nobody is going to use custom filters" AND "InstructionFilter handles ALL use cases" implying that your existing APIs support "everything" without custom. No. They do not, but to spare a short novel like you described it earlier, i wont elaborate it again since you understandably know that.

"but they can back off to custom filter", back to inconsistency. "The API can do everything, because you can just not use the API to do your thing by using the API to not use the API". See how this sentence is logically flawed?

I asked for a comment from an unnamed developer that writes some non ReVanced patches for non YouTube apps, and their comments were:

I am discarding their comment for the following reasons without further consideration:

  1. No objectivity
  2. Uses words like "stupid" in a derogatory light
  3. Uses assumed cases like "0.01% chance" which is incorrect as a statement
  4. Uses terms like "absolutely terrible" which are informal and used in non-objective sentences, e.g, short novels for emotion

That said, the comment is informal, derogatory, factually incorrect, and not objective. The comment is of no use and does contribute negatively to your total argument.

oSumAtrIX avatar Nov 19 '25 10:11 oSumAtrIX

To clarify, my stance here is still very strong in favor of this API. I am telling you this to give you an insight of how strong your arguments should be and how strong they are. I factored in your arguments, I understood what you said, I don't think i missed anything you said so far, repeatedly. All this would be a different story if I wasnt so sure with this because:

  1. Its simpler maintainable code (way less LoC (currently around 200), consistent)
  2. Its feature rich and highly dynamic (as it should be for matching methods and classes arbitrarily)
  3. Its consistent for the developer (no backing away from the API)
  4. It doesnt rely on abstract custom concepts like "filters", "methodCall", "location", instead it relies on known APIs 1:1 (implementation. becomes implementation { } etc) so it doesnt add any "new" concepts
  5. It can describe your API fully without a time complexity of O(n²) (your custom block is O(n²)). Instead my API allows writing your API in constant time complexity (or in relation to finding methods, linear and at best constant thanks to cache, dont think you can even use cache in your custom blocks to begin with proving the convoluteness of the API)
  6. It allows to write project specific extensions with shorter APIs. E.g declarative APIs like yours, in the revanced-patches repo.

I am not your enemy.

oSumAtrIX avatar Nov 19 '25 10:11 oSumAtrIX

The last thing left is to construct the matcher out of scope. The API already allows it but its not elegant to write a val everytime like this:

image

Any scope is called multiple times so I am not sure how to do it properly. Perhaps the find scope could be stateful. This way the state can be persisted there such that this references the state of the find function scope

image

oSumAtrIX avatar Nov 19 '25 10:11 oSumAtrIX

Okay this was a tough one. Basically:

The first* functions had no instance. The nearest receiver instance was BytecodePatchContext. However saving state there was not viable because the first* functions would have to do cleanup after being called which is tricky. Instead, the lambdas have now a second instance receiver for the lifetime of the function call. It works like this:

image image

The method is callable in a bytecodecontext. The lambda in itself is a interface function which provides an operator function that both, takes a context object and the generic type and produces a boolean.

Now the firstMethod function can be called as a regular lambda inside a bytecodepatchcontext:

image

To implement the function the lamdba must be called through several deep receivers:

with(predicate) {
	with(MatchContext()) {  someInstance.match() }
}

First, the interface is exposed as a receiver, because the operator function is a member but takes a different type as a receiver. Then the object it requires a context for is supplied. And at last the operator function can be called by receiving the instance of the generic type.

This for example looks like this:

image

Now for the lifetime of the function call, we have an instance we can use to cache stuff such as the Matcher instances:

image

We provide a key to retrieve it from the instance:

image

Example:

image

I would prefer to not require a key but situations such as this require one:

image

As you can see, within one lambda theres two matchers. The lambda is called multiple times during the lifecycle of the first function call. In order to differentiate between the two matchers, a key is necessary as theres no other identity to rely on. I was initially attempting to build an identity based on the compiled class the nested lambdas would generate and this would work if there was one matcher per lambda, but since we can use multiple, a key is necessary. As before, instantiating a matcher doesnt require a key by itself and can be used on its own, so the key parameter can be removed at the cost of the matcher being instantiated every lambda call. I dont think theres a better way than using a key.

oSumAtrIX avatar Nov 19 '25 16:11 oSumAtrIX

The only alternative I think would be to use the builder pattern somehow. Each call would build a final object. During building the various elements could be placed in suitable scopes. But this would break the immersion of the API because then you wouldn't be able to do things like:

findMethod { name == "something" }

oSumAtrIX avatar Nov 19 '25 16:11 oSumAtrIX

Right now the APIs are consistent with Kotlin. E.g. firstMethod is internally calling methods.first { predicate }.

I was thinking of the by convention such as associateBy:

methodBy { name == "s" }

This would even be almost English "readable" syntax, but it would conflict with what internally happens (methods.first)

@LisoUseInAIKyrios what do you say

oSumAtrIX avatar Nov 19 '25 16:11 oSumAtrIX

So a regression/difference to fingerprints is finding the indices of instructions during fingerprinting. The matchers can support this and I'll also include an example implementation, but I dont know if i will ship that for the reason of responsibility. Finding a method is one thing. Finding an instruction is technically also a thing. But finding multiple instruction indices, is not the responsibility of finding methods. However the API allows this kind of abuse which I will demonstrate with the matcher implementation.

  1. Simple example with a Matcher

    image

    Usage: image

  2. Or doing it directly: image

    Usage: image

oSumAtrIX avatar Nov 19 '25 19:11 oSumAtrIX

However the API allows this kind of abuse which I will demonstrate with the matcher implementation.

You wrongly criticize using custom instruction filters as "punching a hole in the API" (instruction filter is the API!) to satisfy your silly non-existent use cases such as checking if a string has an odd number of letters, then you casual explain your PR here basically requires abusing itself to just get things done.

I have a simple solution that is proven to work. All versions of YT from 20.01 to 20.46 patch without using custom filters or the existing fingerprint kludge block. It's a simple yet flexible expansion of opcode matching with the introduction of variable spacing between the relevant parts of the method instructions.

We can merge what I have, then you can do v23 patcher that is your vision of blocks of code logic everywhere that provides no benefits (none!) over what I already have working right now.

I still have notifications turned off because anything I say is met with nonsense irrelevant points and idiotic use cases examples that don't exist. If you don't believe that instruction filters can solve all fingerprinting then make a new open issue of "SUBMISSION THREAD: POST ALL APK APP METHODS THAT CANNOT BE FINGERPRINTED WITH INSTRUCTION FILTERS WITHOUT USING CUSTOM CLASSES AND WITHOUT CUSTOM FINGERPRINT BLOCKS". Nobody will be able to post anything because instruction filter can handle it all. Or just point at a single instruction in a method and I will provide a concise instruction filter fingerprint without using any custom anything (I'm still waiting!!)

You are on your own with this lambda stuff that provides no benefit over what I have working well right now.

LisoUseInAIKyrios avatar Nov 19 '25 19:11 LisoUseInAIKyrios

You wrongly criticize using custom instruction filters as "punching a hole in the API" (instruction filter is the API!)

It is a correct criticism. Using a custom block goes beyond the defined API boundaries. Supplying the block itself is part of the API, however, the logic inside it bypasses the API entirely. That effectively creates a hole in the abstraction. Furthermore, relying on this escape mechanism for certain scenarios results in inconsistent API usage, which should not be necessary if the API were fully designed.

silly non-existent use cases such as checking if a string has an odd number of letters,

I have repeatedly explained that this is an example demonstrating the API’s inability to express all matching cases. You have continued to miss that point. The example is valid regardless of how trivial it seems to you. It represents a legitimate scenario that forces a fallback to the custom block.

then you casual explain your PR here basically requires abusing itself to just get things done.

You ignored the core of what I stated: determining string indices is not the task of method matching. Any API can be misused beyond its intended purpose, including both yours and mine. I showed that string matching can still be forced to work using the API, which is outside of its design. Your response misrepresents this and implies my API only functions through misuse for its intended purpose, which is incorrect.

I have a simple solution that is proven to work. All versions of YT from 20.01 to 20.46 patch without using custom filters or the existing fingerprint kludge block. It's a simple yet flexible expansion of opcode matching with the introduction of variable spacing between the relevant parts of the method instructions.

Possibly, but your approach introduces numerous regressions that you have omitted from your comparison. They are documented here: https://github.com/ReVanced/revanced-patcher/pull/385#issuecomment-3551916424. My solution addresses all listed issues, while yours does not. Your API appears “proven” only because you have had over a year to refactor patches around its limitations. My proposal has existed for only a short time, and I have already shown that it is capable of handling arbitrary matching scenarios. Suggesting otherwise is misleading.

We can merge what I have, then you can do v23 patcher that is your vision of blocks of code logic everywhere that provides no benefits (none!) over what I already have working right now.

Your current API is inconsistent and opinionated. Declaring that it provides “no” benefits compared to mine directly contradicts the detailed list of API improvements I already provided.

I'm still waiting

As stated multiple times, around five by now, the information you requested is already present in the other PR comments.

Nobody will be able to post anything because instruction filter can handle it all

You rely on the API’s escape hatch, the custom block, to achieve matching. That is not the API itself. The existence of this unrestricted raw lambda block proves the API does not handle every case. Nothing prevents placing all logic inside the custom block, skipping your API entirely.

METHODS THAT CANNOT BE FINGERPRINTED WITH INSTRUCTION FILTERS WITHOUT USING CUSTOM CLASSES AND WITHOUT CUSTOM FINGERPRINT BLOCKS

That said, if I provide an example APK in which your API fails to meet the following requirements, no additional workarounds should be considered acceptable, like we've seen being raised multiple times across the review. If you agree, send the commit of your PR that is the latest, and I'll send the conditions.

oSumAtrIX avatar Nov 19 '25 20:11 oSumAtrIX

This, functionally supersedes the other PR to act as a temporary compatibility to the patches PR

image

I was thinking of adding an "space(size=,optional=)" but after is probably good the way it is

oSumAtrIX avatar Nov 20 '25 18:11 oSumAtrIX

Now just this needs implementation lol

image

oSumAtrIX avatar Nov 20 '25 18:11 oSumAtrIX

fun main() {
    val haystack = listOf("x", "b", "c", "d", "b", "x", "d", "b", "d")

    val matcher = IndexedMatcher<String>()

    matcher.first { this == "b" }
    matcher.after(atLeast = 1, atMost = 2) { this == "d" }
    matcher.after(atLeast = 1, atMost = 3) { this == "d" }

    val matched = matcher(haystack)

    println("Matched: $matched")
    println("Indices: ${matcher.indices}")
}
image

oSumAtrIX avatar Nov 20 '25 18:11 oSumAtrIX

A couple of example Instruction. extensions:

image image

oSumAtrIX avatar Nov 21 '25 11:11 oSumAtrIX

So right now, I have no "builder" behind all these APIs. This leads to the issue of not being able to cache evaluations. Previously I solved this by provoding an instance to the outer APIs where you could cache your data. I have implemented an API "remember" which can be used to cache expressions. The .match APIs for example use this already so their builder scopes dont get reevaluated. I just noticed that this same API is useful in other scopes too. E.g. you have a very expensive computation you could otherwise cache:

image

The example shows that the expression inside the remember lambda can technically be written as a val outside of firstMethod, but say if you need the scope of a parent, (e.g. in your expensive computation you use the method scopes name) then remember is useful for caching here.

Caching is what has to be manually be done by the API user, so it is not ideal, but i think there is no way around it since the API cant know what to evaluate and what not (unless you write a small helper, like the match apis)

oSumAtrIX avatar Nov 21 '25 13:11 oSumAtrIX

Some wrapper examples about dynamics:

image

oSumAtrIX avatar Nov 21 '25 15:11 oSumAtrIX

image

Here are examples of the match api

oSumAtrIX avatar Nov 21 '25 18:11 oSumAtrIX

Things like this can be done with the API which is cool:

image image

Also added these so its easier to access these:

image

oSumAtrIX avatar Nov 21 '25 18:11 oSumAtrIX

Rewrote the other PRs string filter cleanly:

image

oSumAtrIX avatar Nov 21 '25 19:11 oSumAtrIX

Builds now, but no tests have been done yet if it works. I'll need to rebase some changes and then I can test for bugs

oSumAtrIX avatar Nov 22 '25 18:11 oSumAtrIX

Did some test runs on fixtures and it seems to work accordingly. Probably most edge cases will be revealed during a real world run.

oSumAtrIX avatar Nov 22 '25 19:11 oSumAtrIX

Thanks to the flexible API, now matcher lambdas can receive information about current indices:

image

On top they can optimize the matching by advancing the index or stopping the search (e.g. in the case of the first instruction not matching at the first index, then theres no point in searching further indices, or skipping indices such as for atLeast, or stopping if exeding atMost):

image

oSumAtrIX avatar Nov 24 '25 21:11 oSumAtrIX

This leaves us with an extremly capable, but still clean and consistent API that supersedes the filter API, all of which is a subset of the full API provided in this PR:

image

oSumAtrIX avatar Nov 24 '25 21:11 oSumAtrIX

Simple initial side-by-side comparison.

image

oSumAtrIX avatar Nov 25 '25 00:11 oSumAtrIX

I figured another great usecase of the API during developmenet is that you can inject any code such as a println for debugging purposes or even place breakpoints.

oSumAtrIX avatar Nov 25 '25 18:11 oSumAtrIX

A cool thing with this API, thanks to the introspective capabilities, is that it allows sophisticated optimizations such as

image

Here, we can optimize the pattern scan by skipping indices or backtracking early to find the following matching index:

image

oSumAtrIX avatar Nov 26 '25 13:11 oSumAtrIX

image

Discovered a bug, this does not handle labels.

image

This should fix it.

oSumAtrIX avatar Nov 26 '25 23:11 oSumAtrIX