Use string analysis results for reflective calls
Uses the results generated in #1 by:
- Integrating them into the
ClassForNamereflective call analysis - Adding a string analysis demo along with a demo package showcasing the ability of the now integrated analysis to resolve calls across the Apache Xerces Framework
This PR contains some compromises and bugfixes that still need to be ironed out.
This PR will still fail unit (and probably integration) tests since some computation for StringConstancy is missing (often times in the call graph computations that depend on the ClassForName analysis.
This PR will still fail unit (and probably integration) tests since some computation for StringConstancy is missing (often times in the call graph computations that depend on the ClassForName analysis.
Should we force users to schedule a string analysis or should we provide a fallback for the StringConstancy property?
Should we force users to schedule a string analysis or should we provide a fallback for the StringConstancy property?
Afaik the only sound fallback we can provide is the LB of the property (i.e. ANY string) which could potentially blow up the runtime of analyses that previously had more conservative fallbacks or computed the possible strings on the fly in a very limited scope.
I am not sure what the best solution is. Can we configure the framework such that, if not specified otherwise, e.g. L1String runs? We now effectively standardized the ways to obtain strings into different levels of detail, but it seems difficult to choose one over the over...
Well, the analyses use "ANY string" or "NO string" depending on the soundness mode now when they cannot determine a string's value, right? So that (maybe combined with a very, very trivial (would have to be bytecode based, not TAC) local analysis) could be a fallback. Yes, it would be very unsound. But so is the string analysis in regular mode.
Can we configure the framework such that, if not specified otherwise, e.g. L1String runs?
In general, no, that goes against the design principles of the framework. For clients based on the new CLI framework (#226), we can enforce that an analysis is chosen or provide a default value if none is picked. For call graphs acquired via a CallGraphKey, it would also be possible to force a particular (hard-coded or configurable) analysis to run. For all other situations, no, we can't. That's one of the reasons most analyses do provide fallback values.
That was also the reason for my question: If we can provide a fallback that has some actual value (in the sense of benefit, not result), that would be the easiest solution by far. If not, at least for the CLI I would enforce an analysis to be scheduled (although already that brings several difficult questions, such as whether this should be the case when a call graph is configured but without the reflection analyses that need string values) but I'm already not sure whether to also do it for the CallGraphKeys.
Well, the analyses use "ANY string" or "NO string" depending on the soundness mode now when they cannot determine a string's value, right? So that (maybe combined with a very, very trivial (would have to be bytecode based, not TAC) local analysis) could be a fallback. Yes, it would be very unsound. But so is the string analysis in regular mode.
I agree that these would be the reasonable fallbacks, and I am starting to think that they are the only reasonable ones. In case of a very rudimentary bytecode analysis (to at least derive constant string literals etc.), would this be performed inline in the fallback function that is passed at PropertyKey creation?
If we can provide a fallback that has some actual value (in the sense of benefit, not result)
Whether there is such a value depends on the consuming analysis, but in most cases even just analyzing string literals (as in the above) should suffice for some rudimentary results. Imo we should not necessarily force users to specify a string analysis level everywhere (since the call graphs are basically used everywhere), as that would require them to read into and try to distinguish the different levels of string analysis to be able to pick an appropriate one. Or we should, if we decide that we want every user to go to this level of detail.
Perhaps this is rather a documentation / usability issue in the end, but if there is a fallback that can to some degree approximate the results from the different local analyses before, we should supply it here.
would this be performed inline in the fallback function that is passed at PropertyKey creation?
Yes, exactly. I think we should have a look whether we can implement a simple bytecode analysis. Potential for finding constants would be to look at LDC instructions and the ConstantValue attribute for fields.
and the
ConstantValueattribute for fields
This reminds me of https://github.com/opalj/opal/pull/1#issuecomment-3228547244. Could we generalise reading constant-value attributes such that code is reused, or is it little enough code to be reasonable duplicated?
Further, I would suggest that this bytecode analysis stays private to the string package, and instead we encourage consuming analyses to actually deal with a StringConstancy property.
Well, it is not much more code than field.constantFieldValue.isDefined and field.constantFieldValue.get.
It would be the (open) fallback computation of the property, which cannot be restricted. And the framework wouldn't push you to using an actual analysis, since it wouldn't complain about no analysis being scheduled. I am not exactly sure what your concern is, though. Either we force users to schedule a string analysis when the property is needed or we don't, I don't see much of a middle ground there.
It would be the (open) fallback computation of the property, which cannot be restricted. And the framework wouldn't push you to using an actual analysis, since it wouldn't complain about no analysis being scheduled. I am not exactly sure what your concern is, though. Either we force users to schedule a string analysis when the property is needed or we don't, I don't see much of a middle ground there.
I am not suggesting something about end-users scheduling analyses, but rather users that are developing new analyses which directly require string information should not be tempted to use the fallback computation directly, but rather rely on StringConstancy (which then might fallback if needed). This implicitly standardizes the handling of "this string could be here" to use string trees, which is imo a good idea.
But in a sense, copying the code directly into the fallback function already scopes it such that what I propose is given, so it might not need further action.
That's the general expectation in OPAL anyway that properties are queried from the property store.
@maximilianruesch Is a fallback computation something you would like to look into?
I could have a look next week.
I had another look at this. There is some difficulty implementing the bytecode analysis, since the entities for which the property is queried is one of VariableDefinition, VariableContext or MethodParameterContext which are defined in the TAC package. The property key is defined in the BR package, and thus we cannot type-safely access fields from those entities to find the PCs.
How do you suggest we proceed @errt? Simply return LB if in high soundness mode and UB otherwise? Move the property key to TAC (even the trivial string analysis is in TAC)?
If the entities are all defined in tac, maybe the key should be as well? If I understand you correctly, it is impossible to query the key for any sensible entity without depending on tac anyway.