TableTransforms.jl icon indicating copy to clipboard operation
TableTransforms.jl copied to clipboard

Adding a new transform ppmt

Open DianaPat opened this issue 3 years ago • 7 comments

Converts the distribution of a table to be more gaussian.

DianaPat avatar Aug 09 '22 15:08 DianaPat

Hi @DianaPat, my name is Elias. I'm one of the manteiners of the TableTranforms.jl package. I just left my comments on your PR. I hope it helps!

eliascarv avatar Aug 21 '22 19:08 eliascarv

Hi @DianaPat, my name is Elias. I'm one of the manteiners of the TableTranforms.jl package. I just left my comments on your PR. I hope it helps!

Hi Elias, I just finished correcting the details you commented. Tell me if you see something lacking!

DianaPat avatar Aug 22 '22 03:08 DianaPat

That is true, you are using the same projection matrix in the structural removal in the lopp and in the stopping criteria with the ideal Gaussian case. Notice that it is not required, but it is a good idea anyways to avoid recomputing scores again in the loop.

juliohm avatar Aug 30 '22 21:08 juliohm

I was trying to run the test, but realized that the ProjectionPursuit() can't be properly tested as I pulled out the Quantile() and EigenAnalysis() (so the covariance matrix is not approx. I). I was wondering if I should test it with images (as EigenAnalysis) or if it was unnecessary as the most used fuction would be PPMT().

DianaPat avatar Aug 30 '22 23:08 DianaPat

Given that the paper by Friedman defines the transform with these preprocessing steps, maybe we should reconsider adding them back to the apply function of ProjectionPursuit, and then eliminate the PPMT. I thought initially that it would be useful to split the two, but it is causing more challenges than helping us.

Can you concentrate on the ProjectionPursuit transform and make sure that all tests pass?

juliohm avatar Aug 30 '22 23:08 juliohm

I already ran both ProjectPursuit() and PPMT() and they work properly, and PPMT() passes the tests. So I want to confirm what to do with the ProjectPursuit() testing. Meanwhile, I'll commit my changes.

DianaPat avatar Aug 30 '22 23:08 DianaPat

Codecov Report

Merging #105 (2204a25) into master (8716cf6) will decrease coverage by 4.52%. The diff coverage is 54.45%.

@@            Coverage Diff             @@
##           master     #105      +/-   ##
==========================================
- Coverage   94.70%   90.17%   -4.53%     
==========================================
  Files          27       27              
  Lines         831      896      +65     
==========================================
+ Hits          787      808      +21     
- Misses         44       88      +44     
Impacted Files Coverage Δ
src/transforms.jl 100.00% <ø> (ø)
src/transforms/projectionpursuit.jl 54.45% <54.45%> (ø)
src/transforms/parallel.jl 95.06% <0.00%> (ø)
src/transforms/sequential.jl

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Sep 01 '22 03:09 codecov-commenter

@DianaPat try to update your local version to the version in this PR before you continue submitting changes. Or edit it directly here on GitHub by pressing "."

juliohm avatar Sep 21 '22 11:09 juliohm

It is already nicer than the previous version. We don't need the isempty check though right?

The shorter the names of these variables in these small self-contained loops the better because we spend less time reading the names of the variables and more time understanding the big picture of the loop 👍🏻

Em sex., 23 de set. de 2022 08:53, DianaPat @.***> escreveu:

@.**** commented on this pull request.

In src/transforms/projectionpursuit.jl https://github.com/JuliaML/TableTransforms.jl/pull/105#discussion_r978554859 :

  •  @warn "Maximum number of iterations ($count) reached."
    
  •  break
    
  • end

  • push!(cache,(Q,colcache))

  • end

  • 𝒯 = (; zip(names, eachcol(Z))...)

  • newtable = 𝒯 |> Tables.materializer(table)

  • newtable, (cache,tcache)

+end

+function revertfeat(::ProjectionPursuit, newtable, fcache)

  • names = Tables.columnnames(newtable)

  • ppcache, tcache = fcache

  • coltable = newtable

What about this notation? [image: image] https://user-images.githubusercontent.com/105749646/191954467-65739ab2-9962-49ed-9b62-63dd7dc2d6c2.png I tried a more functional notation

— Reply to this email directly, view it on GitHub https://github.com/JuliaML/TableTransforms.jl/pull/105#discussion_r978554859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3OG26XGBAH6GEI5YBLV7WK4RANCNFSM56BGYZPQ . You are receiving this because you commented.Message ID: @.***>

juliohm avatar Sep 23 '22 11:09 juliohm

@DianaPat tests are failing, can you please double check?

juliohm avatar Sep 26 '22 13:09 juliohm

I have been trying to solve this, but the problem is that in windows the test passes. I'll try in a linux!

I tried and the test fails in the windows, and even the eigenanalysis plot test fails so I think its version of ubuntu is too old to try and solve the problem in it. I'll use wsl.

DianaPat avatar Sep 26 '22 14:09 DianaPat

@DianaPat I think the other issue in the stop criteria is more serious. Can you try to fix it first? The test image can be easily generated after the algorithm is working.

juliohm avatar Sep 27 '22 11:09 juliohm

I already solved it.

DianaPat avatar Sep 27 '22 14:09 DianaPat

@DianaPat I have left a few more suggestions. We are almost there.

juliohm avatar Sep 28 '22 13:09 juliohm

@DianaPat let's see if we can cleanup the code a bit more by introducing a helper function that creates the basis vectors:

# j-th basis vector in R^d
basis(d, j) = 1:d .== j

This function can be used everywhere to replace the creation of the matrix E in the code.

juliohm avatar Sep 28 '22 23:09 juliohm

@DianaPat we will need to rebase the PR again onto the latest master branch to introduce the new StatelessFeatureTransform type introduced this morning.

juliohm avatar Sep 29 '22 11:09 juliohm

Closing in favor of #141

juliohm avatar Sep 30 '22 12:09 juliohm