SootUp icon indicating copy to clipboard operation
SootUp copied to clipboard

Use information of operand value to determine type for java locals

Open oxisto opened this issue 1 year ago • 7 comments

This PR uses the already existing information of the operand's value type to determine the type for java locals. Previously, most of the stack / local variables were of "unknown" type.

I am not sure if this catches all the cases where type information about the operand are available, so feel free to add to this PR.

Fixes #635

oxisto avatar Dec 31 '23 19:12 oxisto

I am currently trying to fix the test, the actual outcome after by patch is:

CastCounterDemos $l0,
Sub1 $stack3,
Super1[] $#l0, $#l1, $l1, $l2,
$l0 := @this: CastCounterDemos,
$l1 = newarray (Super1)[10],
$stack3 = new Sub1,
specialinvoke $stack3.<Sub1: void <init>()>(),
$#l0 = (Super1[]) $l1,
$#l0[0] = $stack3,
$#l1 = (Super1[]) $l1,
$l2 = $#l1[2],
return

But shouldn't the type of $l2 then be of Super1 instead of Super1[]? I am not an expert on Simple but I suspect that $l2 = $#l1[2] should return an index of $#l1 (which is of Super1[]), so it should be its element type, not the array type.

oxisto avatar Jan 02 '24 13:01 oxisto

Update: This looks to me to be a bug in JArrayRef. The type is defined as the type of the base, but in my opinion it should be the base's element type. At least that would be consistent with other operands, for example in calls/invokes, where the "type" is the type of the returned function. Or is there another method I should use and the "type" property is something else entirely?

https://github.com/soot-oss/SootUp/blob/23e5a3e5a08a50ebb51d6d659cf27f8b3522278f/sootup.core/src/main/java/sootup/core/jimple/common/ref/JArrayRef.java#L88-L92

oxisto avatar Jan 02 '24 13:01 oxisto

  • TypeAssigner

Unfortunately, it seems it does. Especially with casting, since the casted variable (which was previously unknown), now still has the type pre-cast.

oxisto avatar Jan 02 '24 13:01 oxisto

OK it seems I misunderstood how things work. I was not using these body interceptors at all (they seem to be empty when using JavaClassPathAnalysisInputLocation with only the path argument).

After using BytecodeClassLoadingOptions.Default.bodyInterceptors I was able to get rid of all the unknowns after the type assigner, which is probably the better way of doing things rather than creating the variables with the types. But this is something you need to decide from a design point of view.

It would be good to add a remark to the doc about this or even specificy the BytecodeClassLoadingOptions.Default.bodyInterceptors in the default options when calling JavaClassPathAnalysisInputLocation without a body interceptor argument.

oxisto avatar Jan 02 '24 14:01 oxisto

1:

But shouldn't the type of $l2 then be of Super1 instead of Super1[]? I am not an expert on Simple but I suspect that $l2 = $#l1[2] should return an index of $#l1 (which is of Super1[]), so it should be its element type, not the array type.

yes $l2 should be the arrays element type ad not the array type itself. was the output from your change or from typeassigner?

2:

After using BytecodeClassLoadingOptions.Default.bodyInterceptors I was able to get rid of all the unknowns after the type assigner, which is probably the better way of doing things rather than creating the variables with the types. But this is something you need to decide from a design point of view.

direct assignment seems logical and maybe reduces the workload of the typpeassigner - that has to be investigated ;)

3:

It would be good to add a remark to the doc about this or even specificy the BytecodeClassLoadingOptions.Default.bodyInterceptors in the default options when calling JavaClassPathAnalysisInputLocation without a body interceptor argument.

as default option is/was the plan :)

swissiety avatar Jan 02 '24 14:01 swissiety

1:

But shouldn't the type of $l2 then be of Super1 instead of Super1[]? I am not an expert on Simple but I suspect that $l2 = $#l1[2] should return an index of $#l1 (which is of Super1[]), so it should be its element type, not the array type.

yes $l2 should be the arrays element type ad not the array type itself. was the output from your change or from typeassigner?

It was from my change. It seems that the type resolver does not use the getType function of the operand, but rather has its own type lookup logic located here: https://github.com/soot-oss/SootUp/blob/bc6f0bcc4b0fb1e38f2f3270ab7123ebe63c30eb/sootup.java.bytecode/src/main/java/sootup/java/bytecode/interceptors/typeresolving/TypeChecker.java#L87-L134

2:

After using BytecodeClassLoadingOptions.Default.bodyInterceptors I was able to get rid of all the unknowns after the type assigner, which is probably the better way of doing things rather than creating the variables with the types. But this is something you need to decide from a design point of view.

direct assignment seems logical and maybe reduces the workload of the typpeassigner - that has to be investigated ;)

I can continue working on this PR if you want - even though my original problem was actually "solved" by just activating the type assigner. The only issue the PR has remaining is the one described above with the casts - although I am not sure if I can solve this or someone from the soot team would need to take over.

The question would be: Is there really a valid reason not to use the type assigner?

3:

It would be good to add a remark to the doc about this or even specificy the BytecodeClassLoadingOptions.Default.bodyInterceptors in the default options when calling JavaClassPathAnalysisInputLocation without a body interceptor argument.

as default option is/was the plan :)

Sounds reasonable :)

oxisto avatar Jan 02 '24 15:01 oxisto

TODO

  • [ ] check if this (i.e. direct assignment of types) leads to a similar problem that arised in an earlier version of the typeassigner algorithm

swissiety avatar Jan 02 '24 16:01 swissiety