statsample
statsample copied to clipboard
trouble with Statsample::Bivariate#correlation_matrix
Hi, I'm in trouble with statsample to do PCA analysis for large data. Does anyone have any good idea?
I want to do PCA alanysis with very large data. (3000 variables, 50 samples) Then, I wrote this code.
data_raw = IO.readlines('data1.txt').map{|v| v.split }[1..-1]
hash_tmp = {}
data_raw[1..3000].each do |ary|
hash_tmp[ary[0]] = ary[1..-1].map(&:to_i).to_scale
end
ds = hash_tmp.to_dataset
puts "Input data done!"
cor_matrix=Statsample::Bivariate.correlation_matrix(ds)
puts "cor_matrix was prepared."
pca=Statsample::Factor::PCA.new(cor_matrix)
binding.pry
But the ruby on my mac doesn't return "Cor_matrix was prepared.". I wrote another code to investigate a cause of this.
# Opening Class to investigate where is bottleneck
module Statsample
module Bivariate
class << self
def covariance_matrix_optimized(ds)
x=ds.to_gsl
n=x.row_size
m=x.column_size
puts "calculating means..."
means=((1/n.to_f)*GSL::Matrix.ones(1,n)*x).row(0)
puts "centering matrix..."
centered=x-(GSL::Matrix.ones(n,m)*GSL::Matrix.diag(means))
puts "calculating covariance matrix..."
ss=centered.transpose*centered
puts "calculating n..."
s=((1/(n-1).to_f))*ss
puts "done!" #<= This line has executed
s
end
def correlation_matrix(ds)
vars,cases=ds.fields.size,ds.cases
if !ds.has_missing_data? and Statsample.has_gsl? and prediction_optimized(vars,cases) < prediction_pairwise(vars,cases)
binding.pry
cm=correlation_matrix_optimized(ds)
binding.pry #<= This line hasn't executed. :(
else
cm=correlation_matrix_pairwise(ds)
end
binding.pry
cm.extend(Statsample::CovariateMatrix)
binding.pry
cm.fields=ds.fields
binding.pry
cm
end
end
end
end
Then the Ruby return until "done!" and doesn't return from Statsample::Bivariate#covariance_matrix_optimized method. I haven't seen a Ruby method which doesn't return.
If someone knows a way to solve this problem or investigate cause deeply, please tell me.