Text-Normalization-Demo icon indicating copy to clipboard operation
Text-Normalization-Demo copied to clipboard

links in setup.sh not working

Open tahaceritli opened this issue 5 years ago • 4 comments

Hi,

The following links are not working: https://storage.googleapis.com/ainstein_text_normalization/test_data.zip https://storage.googleapis.com/ainstein_text_normalization/dnc_model.zip

When I run setup.sh, I'm getting this output: Downloading and extracting required files --2019-07-02 15:41:52-- https://storage.googleapis.com/ainstein_text_normalization/test_data.zip Resolving storage.googleapis.com... 216.58.210.240, 2a00:1450:4009:80f::2010 Connecting to storage.googleapis.com|216.58.210.240|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2019-07-02 15:41:52 ERROR 404: Not Found.

--2019-07-02 15:41:52-- https://storage.googleapis.com/ainstein_text_normalization/dnc_model.zip Resolving storage.googleapis.com... 216.58.210.240, 2a00:1450:4009:80f::2010 Connecting to storage.googleapis.com|216.58.210.240|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2019-07-02 15:41:52 ERROR 404: Not Found.

unzip: cannot find or open test_data.zip, test_data.zip.zip or test_data.zip.ZIP. unzip: cannot find or open dnc_model.zip, dnc_model.zip.zip or dnc_model.zip.ZIP. rm: test_data.zip: No such file or directory rm: dnc_model.zip: No such file or directory Finished

Could you update this?

Thanks, Taha

tahaceritli avatar Jul 02 '19 14:07 tahaceritli

Hi, Thanks for opening this issue. The model urls were moved and have now been fixed in the latest commit. Please, check and confirm.

subho406 avatar Jul 02 '19 16:07 subho406

Thanks for the reply. I'm able to download them now. But I think there's a mismatch with the model version, which prevents me from reproducing the notebook: This occurs when I run the Text Normalization Demo notebook at the following line: raw_data['class'] = xgb.predict(data=raw_data)

Processed 100%

AttributeError Traceback (most recent call last) in 1 # Class of tokens in the data ----> 2 raw_data['class'] = xgb.predict(data=raw_data) 3 # Raw to Classified Data 4 classified_data = raw_data.copy(deep=False)

Text-Normalization-Demo/src/XGBclassify.py in predict(self, data) 69 70 # classify as RemainSelf or ToBeNormalized ---> 71 y = self.model.predict(X) 72 y_labels = [self.labels[int(i)] for i in y] 73 return y_labels

~/anaconda3/lib/python3.7/site-packages/xgboost/sklearn.py in predict(self, data, output_margin, ntree_limit, validate_features) 783 prediction : numpy array 784 """ --> 785 test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs) 786 if ntree_limit is None: 787 ntree_limit = getattr(self, "best_ntree_limit", 0)

AttributeError: 'XGBClassifier' object has no attribute 'n_jobs'

To confirm this, I tried the following code too, which gives the same error: import sys sys.path.append("../src")

from XGBclassify import XGB

xgb_path = '../models/english/en_xgb_tuned-trained.pk' xgb = XGB(xgb_path)

print(xgb.model.n_jobs)

AttributeError Traceback (most recent call last) in ----> 1 print(xgb.model.n_jobs)

AttributeError: 'XGBClassifier' object has no attribute 'n_jobs'

Thanks,

tahaceritli avatar Jul 02 '19 18:07 tahaceritli

Are you using the provided deep-tf conda environment during running the notebook? Generally, the parameters n_jobs not found is related to installed version of the xgboost library.

Thanks for the reply. I'm able to download them now. But I think there's a mismatch with the model version, which prevents me from reproducing the notebook: This occurs when I run the Text Normalization Demo notebook at the following line: raw_data['class'] = xgb.predict(data=raw_data)

Processed 100%

AttributeError Traceback (most recent call last) in 1 # Class of tokens in the data ----> 2 raw_data['class'] = xgb.predict(data=raw_data) 3 # Raw to Classified Data 4 classified_data = raw_data.copy(deep=False)

~/Workspace/git/github/aida-repos/pnormalizatiton/notebooks/unit-normalization/Text-Normalization-Demo/src/XGBclassify.py in predict(self, data) 69 70 # classify as RemainSelf or ToBeNormalized ---> 71 y = self.model.predict(X) 72 y_labels = [self.labels[int(i)] for i in y] 73 return y_labels

~/anaconda3/lib/python3.7/site-packages/xgboost/sklearn.py in predict(self, data, output_margin, ntree_limit, validate_features) 783 prediction : numpy array 784 """ --> 785 test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs) 786 if ntree_limit is None: 787 ntree_limit = getattr(self, "best_ntree_limit", 0)

AttributeError: 'XGBClassifier' object has no attribute 'n_jobs'

To confirm this, I tried the following code too, which gives the same error: import sys sys.path.append("../src")

from XGBclassify import XGB

xgb_path = '../models/english/en_xgb_tuned-trained.pk' xgb = XGB(xgb_path)

print(xgb.model.n_jobs)

AttributeError Traceback (most recent call last) in ----> 1 print(xgb.model.n_jobs)

AttributeError: 'XGBClassifier' object has no attribute 'n_jobs'

Thanks,

subho406 avatar Jul 03 '19 10:07 subho406

Yes, and according to the source code I have, n_jobs should exist. When I create a new object of that class, n_jobs can be reached with a default value. But I don't think the model you've uploaded, which is extracted to '../models/english/en_xgb_tuned-trained.pk', contains this parameter.

tahaceritli avatar Jul 03 '19 14:07 tahaceritli