micromlgen icon indicating copy to clipboard operation
micromlgen copied to clipboard

XGBoost Port Code requires access to Temp Files

Open CiprianFlorin-Ifrim opened this issue 2 years ago • 0 comments

As the title says, the XGboost port code uses a temporary file in APPDATA/LOCAL to create a temporary json file. There is no info about this provided to the user. In fact, tested on 3 systems, the file was not generated because the Jupyter Notebook does not have access to the APPDATA/LOCAL folder, even with admin right or by trusting the notebook, it still cannot create it.

This is the type of error generated: XGBoostError: [14:36:23] C:\Users\Administrator\workspace\xgboost-win64_release_1.0.0\dmlc-core\src\io\local_filesys.cc:209: Check failed: allow_null: LocalFileSystem::Open "C:\Users\ZW\AppData\Local\Temp\tmp_mu9qwkg": Permission denied

I have checked the xgboost.py file. The original code is:

def port_xgboost(clf, tmp_file=None, **kwargs):
    if tmp_file is None:
        with NamedTemporaryFile('w+', suffix='.json', encoding='utf-8') as tmp:
            clf.save_model(tmp.name)
            tmp.seek(0)
            decoded = json.load(tmp)
    else:
        clf.save_model(tmp_file)

        with open(tmp_file, encoding='utf-8') as file:
            decoded = json.load(file)

    trees = [format_tree(tree) for tree in decoded['learner']['gradient_booster']['model']['trees']]

    return jinja('xgboost/xgboost.jinja', {
        'n_classes': int(decoded['learner']['learner_model_param']['num_class']),
        'trees': trees,
    }, {
        'classname': 'XGBClassifier'
    }, **kwargs)

SOLUTION: By removing the None from: def port_xgboost(clf, tmp_file=None, **kwargs):

The user can then specify the None in their python script if they would prefer (and if it works) a temp file in APPDATA/LOCAL or they can actually specify the directory with the file ending in .json: print(port(xgb, tmp_file = "C:\\Users\\*username*\\Desktop\\test.json")))

And they can use the code exemplified for the DecisionTree/RandomForest to create a .h file:

with open('XGBoostClassifier.h', 'w') as file:
    file.write(port(xgb, tmp_file = "C:\\Users\\*username*\\Desktop\\test.json"))

Please update the library and add the documentation for the temp file/specified location.

Furthermore, please add all classes in the documentation. So the users know exactly how to use the namespace: Example given: Eloquent::ML::Port::RandomForestRegressor regressor;

Correct namespace call for other ML types:

Eloquent::ML::Port::SVM name_to_be_used_in_code;
Eloquent::ML::Port::OneClassSVM name_to_be_used_in_code;
Eloquent::ML::Port::SEFR name_to_be_used_in_code;
Eloquent::ML::Port::DecisionTreeClassifier name_to_be_used_in_code;
Eloquent::ML::Port::DecisionTreeRegressor name_to_be_used_in_code;
Eloquent::ML::Port::RandomForestClassifier name_to_be_used_in_code;
Eloquent::ML::Port::GaussianNB name_to_be_used_in_code;
Eloquent::ML::Port::LogisticRegression name_to_be_used_in_code;
Eloquent::ML::Port::PCA name_to_be_used_in_code;
Eloquent::ML::Port::PrincipalFFT name_to_be_used_in_code;
Eloquent::ML::Port::LinearRegression name_to_be_used_in_code;
Eloquent::ML::Port::XGBClassifier name_to_be_used_in_code;

Thank you and take care!

CiprianFlorin-Ifrim avatar Apr 17 '22 21:04 CiprianFlorin-Ifrim