mgwr icon indicating copy to clipboard operation
mgwr copied to clipboard

index error in gwr prediction

Open ljwolf opened this issue 6 years ago • 9 comments

Vinayaraj Poliyapam writes:

Thanks a lot for the pysal mgwr work!

I was using GWmodel in R earlier. I tried to use your module in python. I can fit the model, but if I facing problem while predicting when I use more samples than I used for training.

I get the following error.

IndexError: index 3699 is out of bounds for axis 0 with size 3699

Any comments on this greatly appreciated.

ljwolf avatar Jan 18 '19 10:01 ljwolf

Hi Vinayaraj,

I'll need minimum working example if I can help you.

Please send me the data you're using & the code you're running.

ljwolf avatar Jan 18 '19 10:01 ljwolf

In my case. The trained records number (X_train) must be more than tested records (X_test) at least 51:49 for X_train:X_test respectively. The problem is when we apply this algorithm on raster data the number of the pixel will be more than the trained records in the model memory in all cases.

This is the default as you know: X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, test_size=0.25)

No problem here: X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.51, test_size=0.49)

But the problem will appear with: X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.49, test_size=0.51)

abdelrazekelnashar avatar Jan 31 '19 00:01 abdelrazekelnashar

I'm having the same issue. I can reproduce it with this code

import numpy as np

from mgwr.gwr import GWR

cal_coords = np.random.randn(10,2)
cal_y = np.random.randn(10,1)
cal_X = np.random.randn(10,2)

pred_coords = np.random.randn(20,2)
pred_y = np.random.randn(20,1)
pred_X = np.random.randn(20,2)

model = GWR(cal_coords, cal_y, cal_X, 7)
gwr_results = model.fit()

pred_results = model.predict(pred_coords, pred_X)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-1-1d157a7d4a7a> in <module>
     14 gwr_results = model.fit()
     15 
---> 16 pred_results = model.predict(pred_coords, pred_X)

~/github/pysal/mgwr/mgwr/gwr.py in predict(self, points, P, exog_scale, exog_resid, fit_params)
    411             self.bw,
    412             points)
--> 413         gwr = self.fit(**fit_params)
    414 
    415         return gwr

~/github/pysal/mgwr/mgwr/gwr.py in fit(self, ini_params, tol, max_iter, solve, searching)
    353                                 max_iter, wi=wi)
    354                     params[i, :] = rslt[0].T
--> 355                     predy[i] = rslt[1][i]
    356                     w[i] = rslt[3][i]
    357                     S[i] = np.dot(self.X[i], rslt[5])

IndexError: index 10 is out of bounds for axis 0 with size 10

I tested this on the latest version from github as of 2019-02-08

commit 3bdfdf275716aefef4561decee6ec078da4259d4
Merge: f77e334 5a49150
Author: Wei Kang <[email protected]>
Date:   Fri Jan 4 21:03:43 2019 -0800

    Merge pull request #48 from pysal/version-bump

    update version in __init__.py

jpursell avatar Feb 08 '19 16:02 jpursell

@ljwolf I exactly have the same problem. This issue happens when the size of the test data(for prediction) is more than the train data(which is used for fitting). I hope the developers solve the problem soon.

ali1100 avatar Nov 05 '19 18:11 ali1100

I noticed variables 'self.P' didn't used in 'self.predict()' and I rewrited the function predict(). It can run but I don't know if results is right. Attached zip file is the changed codes. gwr.zip

WilliamZcy avatar Apr 14 '21 13:04 WilliamZcy

I'm receiving the same index error when the size of my train data is smaller than my test/predict data. I can overcome this by subsetting my test/predict data and iterating through, but unsure if this is the best way to do it?

jack-tuna avatar Jun 14 '21 05:06 jack-tuna

I have to resample my train data for accomplishing both consistency . What problems does it cause to the result?

plo97 avatar Mar 08 '22 09:03 plo97

Is there an effective solution?

frong0824 avatar May 01 '22 12:05 frong0824

The problem arises since the prediction wants to fully reuse the fitting function for training, which unfortunately triggers the part that iterates all training data (using the index of the test data ...) to calculate the fitting performances. The solution is similar to what was proposed by @WilliamZcy, but there is no need to add additional "self.P" as it is already correctly called in "predictions()" through the line "P = self.model.P". Below is my quick remedy for this issue. I have tested that the results are consistent with the original version.

gwr.zip

tjleizeng avatar Dec 09 '23 21:12 tjleizeng