systemds icon indicating copy to clipboard operation
systemds copied to clipboard

[SYSTEMDS-3386] Refactor replacement of CP instructions

Open kev-inn opened this issue 2 years ago • 13 comments

This reworks our runtime replacement of CP instructions with FED versions when appropriate. SP replacement will be done in another PR.

This implementation was my second approach, the first one consisted of static methods in the FED instructions. I much prefer this one as it is quite clear which instructions are replaced, which constraints the CP imposes and we don't have to lead every check with an instanceof check. The downside of this approach is that it can get a bit confusing with inheritance, but it is---and will probably stay---manageable.

Note that not everything is replaced yet, I will work on it while the general approach can already be reviewed.

During the testing I found myself requiring to check if actual FED instructions are executed, is there a way to ensure this in the test framework, or is this already ensured in some way via privacy? Tests checking the heavy hitters do it implicitly.

NOTE: I deleted a fix in QuantilePickCPInstruction (bf19244c909ae8354bc8b3da0985eaf448aa5f0c), as the common layout is no longer necessary and null is cleaner. This is intentional and the fix is no longer necessary.

kev-inn avatar Jun 07 '22 14:06 kev-inn

As a side note, when I ran all the tests the following testcases fail, but work when executed in isolation:

  • FederatedWorkerIntegrationCRUDTest.testWorkerAddedForMonitoring
  • FederatedColAggregateTest.testTernaryColSumDenseMatrix
  • FederatedCtableTest.federatedCtableSinglenode (configuration [3])
  • FederatedFullAggregateTest.testTernarySumDenseMatrixCP (configuration [0] and [1])
  • FederatedWeightedCrossEntropy (configuration [0] and [1])
    • federatedWeightedCrossEntropySpark
    • federatedWeightedCrossEntropySparkEpsilon
    • federatedWeightedCrossEntropySparkNode
    • federatedWeightedCrossEntropySparkNodeEpsilon
  • FederatedWeightedDivMatrixMultTest all or almost all
  • FederatedWeightedSigmoidTest all
  • FederatedWeightedSquaredLossTest all
  • FederatedWeightedUnaryMatrixMultTest all (both configurations)

I remember that this is a known problem. Edit: I was behind a few commits, so this information might be outdated.

kev-inn avatar Jun 07 '22 14:06 kev-inn

As a side note, when I ran all the tests the following testcases fail, but work when executed in isolation:

* `FederatedWorkerIntegrationCRUDTest.testWorkerAddedForMonitoring`

* `FederatedColAggregateTest.testTernaryColSumDenseMatrix`

* `FederatedCtableTest.federatedCtableSinglenode` (configuration [3])

* `FederatedFullAggregateTest.testTernarySumDenseMatrixCP` (configuration [0] and [1])

* `FederatedWeightedCrossEntropy` (configuration [0] and [1])
  
  * `federatedWeightedCrossEntropySpark`
  * `federatedWeightedCrossEntropySparkEpsilon`
  * `federatedWeightedCrossEntropySparkNode`
  * `federatedWeightedCrossEntropySparkNodeEpsilon`

* `FederatedWeightedDivMatrixMultTest` all or almost all

* `FederatedWeightedSigmoidTest` all

* `FederatedWeightedSquaredLossTest` all

* `FederatedWeightedUnaryMatrixMultTest` all (both configurations)

I remember that this is a known problem. Edit: I was behind a few commits, so this information might be outdated.

Yes this is known.

You can run your tests such that they repeat if failing, there you will detect these "flaky" tests. If you use IntelliJ, you can also add a new execution mode where you repeat failing tests

mvn clean compile test -Dmaven.test.skip=false -Drerun.failingtests.count=1 -D automatedtestbase.outputbuffering=true -Dtest=org.apache.sysds.test.$@

Baunsgaard avatar Jun 08 '22 08:06 Baunsgaard

This implementation was my second approach, the first one consisted of static methods in the FED instructions. I much prefer this one as it is quite clear which instructions are replaced, which constraints the CP imposes and we don't have to lead every check with an instanceof check. The downside of this approach is that it can get a bit confusing with inheritance, but it is---and will probably stay---manageable.

I am not sure if this is the best location to edit. Would it not make more sense to change it when we construct the CP vs SP vs GPU instructions from the string versions of instructions themselves? This way if we for instance in the future add FPGA instructions it naturally also support changing these into federated instructions, and we do not have to maintain both all instances for each federated instruction which would lead to excessive testing requirements where we need to hit each instruction as either cases.

Baunsgaard avatar Jun 08 '22 09:06 Baunsgaard

You have a point there, I also realized this recently when trying to also replace SP Instructions without duplicating code. I will weigh the options, if I come up with more, but will most likely opt to replace it already during parsing.

Edit: this actually is not that simple, as we create our instructions before we get the runtime information if an object is federated AFAIK. Note that federated instruction compilation exists by now, but this is about a runtime decision. I will search for a smooth approach, currently only interfaces come to mind, but would like to find something better.

kev-inn avatar Jun 08 '22 09:06 kev-inn

You have a point there, I also realized this recently when trying to also replace SP Instructions without duplicating code. I will weigh the options, if I come up with more, but will most likely opt to replace it already during parsing.

Edit: this actually is not that simple, as we create our instructions before we get the runtime information if an object is federated AFAIK. Note that federated instruction compilation exists by now, but this is about a runtime decision. I will search for a smooth approach, currently only interfaces come to mind, but would like to find something better.

You are right, it is not that simple but try to prioritize a good extensible design and then we can expand upon it. If you cannot find a better way, then i still can live with the design proposed already.

Baunsgaard avatar Jun 08 '22 11:06 Baunsgaard

Sure, that is the number one priority of this rework, that is why I also would like to remove the dependency for the string representations of those instructions to be similar/matching. In the future we want to send only CP instructions to the workers and locally choose CP, SP or GPU (maybe even FED?). Keeping this in mind, we could add conversions for all kinds of instructions to CP (and require constructors like the ones I added for FED) or add interfaces with getters for specialized instructions like BinaryAggregationInstruction. The approach, which should be the nicest to work with, would be a complete rework of the hierarchy of instructions, adding the type of execution only at the leaf specializations. Instead of ComputationCPInstruction and ComputationSPInstruction we would have `ComputationInstruction. This would require a very intrusive change of a lot of core implementations and seems infeasable.

kev-inn avatar Jun 08 '22 11:06 kev-inn

After a lot of consideration I propose the following:

  1. Switch to the first implementation approach. The refactor makes the code more maintainable, because of less indentation and implementation in the corresponding FEDInstruction. The static methods perform checks for both CPInstruction and SPInstruction, which will necessarily have large overlaps. A constructor with the SPInstruction as input (like the ones I already added with CPInstruction) will be added where necessary.
  2. Add a method to most SPInstructions to convert it to a CPInstruction. This will be used to allow the federated worker to choose which execution type he wants to use. Most SPInstruction can be converted to CPInstruction (and vice versa) by changing the execution type in the instruction string before parsing, those we could simply tag with an interface. This would reduce code quite a bit, but is harder to maintain. Opinions are appreciated.
  3. Some SPInstructions like ReblockSPInstruction don't have an equivalent CPInstruction, those will be sent to the worker as is. A problem arises when the federated worker opts to use the CPInstruction for inputs instead, this would require us to ignore the SPInstruction. Not sure if checking for RDD handles on the inputs is enough(?).

kev-inn avatar Jun 08 '22 18:06 kev-inn

Thanks for getting started on this issue @kev-inn. Unfortunately, I think the PR goes a bit in the wrong direction. We first want to do a basic cleanup of the federated instructions, without modifying the CP or Spark instructions. The replace in FEDInstructionUtils should stay mostly as it is, and all federated instructions should have overloaded parseInstruction methods to consume either CP or Spark instructions (objects) directly and construct the FED instruction like [1] without replicating the string parsing logic. For the parseInstruction with strings, we can call the parse on the respective CP instruction (for consistency) and use again the above methods.

In a second step, we would remove the replicated instructions like MMFEDInstruction (redundant to AggregateBinaryFEDInstruction) and let all instruction send CP instruction strings to the federated workers.

In a third step, we then extend the federated workers to (under certain conditions) create a hop that represents the CP instruction and generate lops and instructions as needed (e.g., CP, Spark, GPU).

Sorry, if I was not clear during our offline discussions.

[1] https://github.com/apache/systemds/blob/main/src/main/java/org/apache/sysds/runtime/instructions/fed/CentralMomentFEDInstruction.java#L77

mboehm7 avatar Jun 08 '22 20:06 mboehm7

(Just because i regret deleting it -- i wrote this while you commented Matthias)

I suggest that you make the methods inside CP and SP instructions part of a interface that allows any instruction to change to FED, and then make the footprint of these functions as small as possible inside each of the CP and SP instructions. All of these instructions should call into the fitting FEDInstructions class that then handle the logic of converting the instructions to FEDInstructions though static constructors that can return null. With this we simply try to call this interface function on the instructions to change them into FEDInstructions when they can before execution. In the static construction it is easy to change the SP instructions (or GPU etc) to CP if possible, otherwise the static constructor would construct SP type.

This design addresses all 3 points with (i think) the least amount of code.

-- Comment to Matthias's point

We are missing one step where we can avoid the whole FEDInstructionUtils code with casting that Kevin i think correctly is addressing in some of in his changes, where we can remove the entire if else tree if we add a interface method that simply map to the static constructors in your step 1. If we set default for all methods to return null for this interface "toFedInstruction" then we can just overwrite all instances of instructions that can change into FEDInstructions and get much cleaner code from it, --- with the downside of having small overloaded functions that map to the static constructors for FEDInstructions in each CP and SP instruction---

Baunsgaard avatar Jun 08 '22 21:06 Baunsgaard

Thanks for getting started on this issue @kev-inn. Unfortunately, I think the PR goes a bit in the wrong direction. We first want to do a basic cleanup of the federated instructions, without modifying the CP or Spark instructions. The replace in FEDInstructionUtils should stay mostly as it is, and all federated instructions should have overloaded parseInstruction methods to consume either CP or Spark instructions (objects) directly and construct the FED instruction like [1] without replicating the string parsing logic. For the parseInstruction with strings, we can call the parse on the respective CP instruction (for consistency) and use again the above methods.

In a second step, we would remove the replicated instructions like MMFEDInstruction (redundant to AggregateBinaryFEDInstruction) and let all instruction send CP instruction strings to the federated workers.

In a third step, we then extend the federated workers to (under certain conditions) create a hop that represents the CP instruction and generate lops and instructions as needed (e.g., CP, Spark, GPU).

Sorry, if I was not clear during our offline discussions.

[1] https://github.com/apache/systemds/blob/main/src/main/java/org/apache/sysds/runtime/instructions/fed/CentralMomentFEDInstruction.java#L77

No worries, as @Baunsgaard stated, I think the majority of this PR is still a nice addition and it fits your first step nicely. I would like to have 2/3 methods:

  1. Interface method named toFedInstruction for CP and SP Instructions
  2. Static "constructor" method for FEDInstructions parseInstruction() for SP and CP

EDIT: Method names can be changed, please feel free to give your opinion

kev-inn avatar Jun 09 '22 11:06 kev-inn

I am sorry that this PR got so large. I believe it is better to do it in one go, but I can also break it down so the review is faster.

I recommend to open the old FEDInstructionUtils.java file on one screen and on the other screen go through all added toFedInstruction() methods. This will allows to easily match the old conditions with the new, more compact and (hopefully) easier to read conditions. There aren't many changes to the logic, except that I changed (or added, see AppendSPInstruction) some super-classes.

kev-inn avatar Jun 13 '22 13:06 kev-inn

Not ready for review, found some problems with the spark instructions.

kev-inn avatar Jun 13 '22 13:06 kev-inn

ConcurrentModificationException seems to be more frequent with this PR, I do not know what the reason is for that, but that is anyway a fix for the future. EDIT: this should be because of the PR 942a3a2a349cee2fcf3591e7850051538cc41fef

kev-inn avatar Jun 14 '22 13:06 kev-inn

Closing to begin the merging phase, Thanks for the contribution @kev-inn

Baunsgaard avatar Aug 18 '22 14:08 Baunsgaard