TableTransforms.jl icon indicating copy to clipboard operation
TableTransforms.jl copied to clipboard

Add ProjectionPursuit (new PR)

Open DianaPat opened this issue 3 years ago • 19 comments

DianaPat avatar Sep 30 '22 12:09 DianaPat

Codecov Report

Merging #141 (df7af29) into master (a075bb8) will increase coverage by 0.42%. The diff coverage is 96.80%.

@@            Coverage Diff             @@
##           master     #141      +/-   ##
==========================================
+ Coverage   94.71%   95.14%   +0.42%     
==========================================
  Files          26       26              
  Lines         795      885      +90     
==========================================
+ Hits          753      842      +89     
- Misses         42       43       +1     
Impacted Files Coverage Δ
src/transforms.jl 100.00% <ø> (ø)
src/transforms/projectionpursuit.jl 96.80% <96.80%> (ø)
src/transforms/parallel.jl 95.00% <0.00%> (-0.07%) :arrow_down:
src/transforms/identity.jl
src/transforms/scale.jl 100.00% <0.00%> (+5.55%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Sep 30 '22 13:09 codecov-commenter

Hi @juliohm I already changed to \cdot notation, checked that the diag function really gives a unitary vector and got rid of some outdated notation. But when I tried to change the plots in the visual test, they didn't get plotted all too well: projectionpursuit-1 projectionpursuit-2 One of the possible changes would be to modify the gr settings to change the size of the image, but even so the graphics at the diagonal keep coming out as lines (just like in the images above), so for now I just did the test with each corner separately and will continue checking the problem of the diagonals.

DianaPat avatar Oct 03 '22 01:10 DianaPat

I already changed to \cdot notation, checked that the diag function really gives a unitary vector and got rid of some outdated notation.

That is great @DianaPat , now we can remove the normalization of alpha inside the rscore function, and only normalize it in the neadermead step. I will propose the changes so that you can review and accept.

But when I tried to change the plots in the visual test, they didn't get plotted all too well:

Let's just plot the result of the corner plot after the transformation in one figure and the result of reverting the transformation in a separate figure. They don't need to be in the same figure. Also, you can pass the option size=(800,800) to the specific plot commands to overwrite the default size set in the gr command.

juliohm avatar Oct 03 '22 15:10 juliohm

Hi @DianaPat can you please double check that this GitHub option is enabled on this PR?

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork

I fixed the problem locally and would like to push the changes to the PR.

juliohm avatar Oct 10 '22 22:10 juliohm

It is enabled

DianaPat avatar Oct 11 '22 09:10 DianaPat

Hi @DianaPat it is saying that permission is denied. Can you incorporate the fixes in your local copy and regenerate the figures?

  1. Remove the option aspectratio=:equal from the runtests.jl file and place it in every existing plot command that existed before in the test files. Basically, scale.jl, zscore.jl, eigenanalysis.jl, if I recall correctly.
  2. Delete all png files in the test/data directory and rerun the tests to generate new figures.
  3. You should see all corner plots correctly plotted now. The aspectratio option was the issue.

After you make the changes you will see that the projection pursuit result is not looking quite right yet. The final distribution is not spherical visually indicating that either the number of iterations wasn't enough or that some other bug is still hidden somewhere.

juliohm avatar Oct 11 '22 10:10 juliohm

Thank you @DianaPat. As you can see the first case is not working as expected given that one of the bivariate distributions doesn't look spherical. Can you please take a closer look at what may be happening? After the apply is fixed we we need to also plot the revert.

juliohm avatar Oct 12 '22 01:10 juliohm

Yes, I'm already looking for this bug and other ones I've found. I'll update as soon as I find something!

DianaPat avatar Oct 12 '22 01:10 DianaPat

Hi @juliohm, I think the error is fixed.

In fact the function was not comparing the data to a gaussian up to perc percentile, but with 1-perc. There was also an issue with just taking a higher percentile because the optim function raised an error. That part was just fixed with the float at line 93. But then the tests stopped passing (at this point I'm wondering how did they pass before), so I relaxed the conditions.

At this point I think maybe it would be better to change all occurrences of perc to 1-perc in the code so we can keep the previous value, as it is more intuitive.

DianaPat avatar Oct 15 '22 18:10 DianaPat

Hi @DianaPat thank you for checking it. Can you please move the float call to within the definition of the basis function? This function should return a floating point vector always to avoid type instability.

Regarding the percentile option, you mean that it is more intuitive to use the value .99? It would be great if you could make the appropriate changes. After that, it would be interesting to print the number of structure removal iterations that are happening in both test cases. It seems that the structure is not quite Gaussian still? Visually, the resulting point cloud has some borders that differ from those of a multivariate Gaussian.

juliohm avatar Oct 15 '22 20:10 juliohm

I already made some experiments. Without any modification of parameters it runs the structure removal cycle 23 times. By changing the parameters a little, it reached 101 iterations; the problem is that it took aprox. 300 seg, so the tests would take more time.

I saved the figures to do a comparison of improvement:

test2 test

DianaPat avatar Oct 15 '22 22:10 DianaPat

I am finding it strange that the the points near the "tails" of the resulting distribution are somewhat spread. If you plot the ideal standard Gaussian you will be able to see the difference. Do you know how to use the VSCode debugger? You could add a plot command inside the structure removal loop and then add a breakpoint to see the plot at each iteration. Another thing you can do is check the optimal alphas as we did in class to see if everything is working as expected. Finally, a good diagnostic would be to plot the q-q plot of the transformed columns against the standard Gaussian to see if they pass through the identity line.

juliohm avatar Oct 15 '22 23:10 juliohm

You are right! The tails diverge from a gaussian outside the interval [-3,3]. I double checked the removal structure function but it's working as expected. I went back to the bibliography (Friedman pg 255) and found this: Tails_under When I tried increasing the deg argument, there was a slight improvement, but it came at the cost of time.

DianaPat avatar Oct 16 '22 22:10 DianaPat

That is a very good catch. Can you comment on the time increase? Is it 2x slower? How the plots look like with higher degrees? Can we find a good compromise by default?

Em dom., 16 de out. de 2022 19:46, DianaPat @.***> escreveu:

You are right! The tails diverge from a gaussian outside the interval [-3,3]. I double checked the removal structure function but it's working as expected. I went back to the bibliography (Friedman pg 255) and found this: [image: Tails_under] https://user-images.githubusercontent.com/105749646/196061909-119106f3-f4c8-461f-a664-4cc8d7baff4b.PNG When I tried increasing the deg argument, there was a slight improvement, but it came at the cost of time.

— Reply to this email directly, view it on GitHub https://github.com/JuliaML/TableTransforms.jl/pull/141#issuecomment-1280074599, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3MAVSGX4ULX2O5SD7TWDSAUBANCNFSM6AAAAAAQZXLHKQ . You are receiving this because you were mentioned.Message ID: @.***>

juliohm avatar Oct 16 '22 22:10 juliohm

The time increase is ~3.5x: With deg=5 it took around 16 sec to obtain the answer (image on the left), and with deg=30 it took 352 sec (image on the right) qq1 qq30 I think the improvement is not enough to offset the cost. An idea could be to do another process post-transformation to normalize the tails.

DianaPat avatar Oct 17 '22 12:10 DianaPat

Thank you @DianaPat for these careful checks. I agree with you that the time cost does not pay off, and we just need to be aware of this behavior near the tails.

I will try to review the PR one more time during the day. It would be important to plot the transformed and reversed data for both data sets if that is not already done.

juliohm avatar Oct 17 '22 12:10 juliohm

That is really awesome @DianaPat. I think we finally made it! 🎉 Congrats!

One additional change we could implement to avoid depending on the heavy Optim.jl package is replace it with the NelderMead.jl package, which only implements the algorithm are currently using: https://github.com/jwscook/NelderMead.jl

You can learn more about the algorithm in this lecture: https://youtu.be/vOYlVvT3W80 It is beautiful and simple idea.

In theory, we would just need to replace the Optim.optimize function by the NelderMead.optimise function and rerun the tests. If you feel that you have some time to try this out, please go ahead. Otherwise, I invite you to review the original PR you submitted to compare how much you improved your coding skills in the process 👏🏽

These picky reviews pay off in the long run, and I am really happy you went all the way to the end! 🥇

juliohm avatar Oct 17 '22 22:10 juliohm

Thank you, I learned a lot from this experience!

On the other hand, I tried to implement the change of package, but it seems that the autor have not registered the package, so I couldn't do it. For now will leave it as is while I look for other alternatives.

DianaPat avatar Oct 20 '22 00:10 DianaPat

I've opened an issue requesting registration: https://github.com/jwscook/NelderMead.jl/issues/5

juliohm avatar Oct 20 '22 10:10 juliohm

@DianaPat NelderMead.jl is now registered 👌🏽 would you like to give it a try here to see if the results are the same?

juliohm avatar Oct 26 '22 13:10 juliohm