MP-SPDZ icon indicating copy to clipboard operation
MP-SPDZ copied to clipboard

SGDLogisti for millions data

Open Repo-Zhuang opened this issue 10 months ago • 1 comments

Hello, here is my code

    N=10000
    data1 = np.loadtxt('./party0_train.csv', delimiter=',', skiprows=1)
    data2 = np.loadtxt('./party1_train.csv', delimiter=',', skiprows=1)
    data1 = data1[1:N, 1:]
    data2 = data2[1:N, 1:]
    X_train_guest=sfix.input_tensor_via(1, data1[:,1:])
    Y_train_guest=sfix.input_tensor_via(1,data1[:,0])
    X_train_host=sfix.input_tensor_via(0, data2)
    X_train = X_train_guest.concat_columns(X_train_host)
    log = ml.SGDLogistic(3, N-1)
   
    log.fit(X_train, Y_train_guest)

I have data at the million-level scale that I want to use for logistic regression. I found that when the dataset reaches a certain magnitude, this message appears(e.g. N= 10000) ,What does this mean?

tensor-0-begin-loop-1 blowing up rounds:  (2999 / 2999) ** 3 < 2999
tensor-0-begin-loop-5 blowing up rounds:  (2999 / 2999) ** 3 < 2999

and it it very slow , i would like to know approximately how long it takes to train logistic regression with millions dataset , and are there any optimization methods available

Repo-Zhuang avatar Apr 22 '24 16:04 Repo-Zhuang

You can safely ignore the message. What protocol are you using? Is the time more than linear in the number of samples?

mkskeller avatar Apr 23 '24 01:04 mkskeller