m2cgen
m2cgen copied to clipboard
Any limit in export_to_c ? I am getting some python crash with large XGBoost model
xgb is my trained XGBoost model with .fit method The model is rather large with 3000 trees and depth of 8
I save the model as a txt file and it is rather large 28.5 MB using xgb.get_booster().dump_model('xgb_model_rules.txt', with_stats = True)
then I want to convert to some c code (I am trying some post-editing to make it a MATLAB code)
code = m2c.export_to_c(xgb) print("write the code to a text file") text_file = open("xgb_c.txt", "w") text_file.write("C code for the XGB structure: %s" % code) text_file.close()
my python (version 3.8.10) crashes during code = m2c.export_to_c(xgb) ...
Any suggestions ? I tried with a smaller boston data set resulting in a smaller model and it works fine.
I can share the xgb_model_rules.txt if needed
My apologies ... when I try with a smaller number of trees for XGB it is fine so it is most probably some memory issues, my laptop has only 32MB ... when I try to scale up to 500 it runs ok !
Hey @bhamadicharef ! Thanks for using m2cgen. Please provide error logs.
my laptop has only 32MB
Yeah, that's might be a root cause! Is it possible to run exporting code on a more powerful machine? If your mode is not sensitive, you can post it here and I'll try to export it for you.
I have servers with more memory up to 1TB RAM (Dell R815) so will try to test further. Very novice in Python, I do not know how to debug code, logger does not say anything. Is there some overall Python log file ?
To be honest on a 10 fold cross validation setting, the regression model has an RMSE of about 10.95 with R2 0.88 with only only 100 trees and depth of 8, so it is not too bad. Best tuning was giving us 3000 trees with depth of 8m improving RMSE to 7.33 and R2 of 0.94, this is why I wanted to stretch to 3000.
I have servers with more memory up to 1TB RAM (Dell R815) so will try to test further.
This is nice to hear. Looking forward for testing results.
Very novice in Python, I do not know how to debug code, logger does not say anything. Is there some overall Python log file ?
I think you can start from reading https://docs.python.org/3/library/debug.html.
I hope I've managed to find out what's going on. First I thought that you've encountered recursion error: https://github.com/BayesWitnesses/m2cgen#faq. But it's odd that you haven't any error stack trace. Then I wrote a code with default XGBoost estimator and Boston dataset that when set number of trees to 3000 killed my Jupyter Notebook and ordinary Python process during executing .py
script without any message.
After that I checked exit code after executing my script and it was -1073741571
. This value led me to the following great StackOverflow question: https://stackoverflow.com/questions/20629027/process-finished-with-exit-code-1073741571. And this answer helped to run my code successfully: https://stackoverflow.com/a/31902018.
Here is the code that can be executed without any problems and generates valid xgb_c.txt
file at the end. The thing was in that setting recursion limit (sys.setrecursionlimit(np.iinfo(np.intc).max)
) is not enough sometime but increasing stack size threading.stack_size(200000000)
helped.
import sys
import threading
import numpy as np
import m2cgen as m2c
import xgboost as xgb
from sklearn.datasets import load_boston
def generate_code():
X, y = load_boston(return_X_y=True)
est = xgb.XGBRegressor(n_estimators=3000).fit(X, y)
code = m2c.export_to_c(est)
with open('xgb_c.txt', 'wt') as f:
f.write(code)
if __name__ == '__main__':
sys.setrecursionlimit(np.iinfo(np.intc).max)
threading.stack_size(200000000)
thread = threading.Thread(target=generate_code)
thread.start()
Please try to wrap your real code into a similar construction with increased recursion limit and stack size for a new thread and provide a feedback whether it was helpful.
It is working well now, thank you very much. For n_estimators=1000, the resulting c-code is 13.4 MB. I have coded some MATLAB script to make the c-code into MATLAB. I will share it soon.
Question... why is the output needs to have so many parentheses in the final summation ?
returnvar0) + (var1)) + (var2)) + (var3)) + (var4)) + (var5)) + (var6)) + (var7)) + (var8)) + (var9)) + (var10)) + (var11)) + (var12)) + (var13)) + (var14)) + (var15)) + (var16)) + (var17)) + (var18)) + (var19)) + (var20)) + (var21)) + (var22)) + (var23)) + (var24)) + (var25)) + (var26)) + (var27)) + (var28)) + (var29)) + (var30)) + (var31)) + (var32)) + (var33)) + (var34)) + (var35)) + (var36)) + (var37)) + (var38)) + (var39)) + (var40)) + (var41)) + (var42)) + (var43)) + (var44)) + (var45)) + (var46)) + (var47)) + (var48)) + (var49)) + (var50)) + (var51)) + (var52)) + (var53)) + (var54)) + (var55)) + (var56)) + (var57)) + (var58)) + (var59)) + (var60)) + (var61)) + (var62)) + (var63)) + (var64)) + (var65)) + (var66)) + (var67)) + (var68)) + (var69)) + (var70)) + (var71)) + (var72)) + (var73)) + (var74)) + (var75)) + (var76)) + (var77)) + (var78)) + (var79)) + (var80)) + (var81)) + (var82)) + (var83)) + (var84)) + (var85)) + (var86)) + (var87)) + (var88)) + (var89)) + (var90)) + (var91)) + (var92)) + (var93)) + (var94)) + (var95)) + (var96)) + (var97)) + (var98)) + (var99)) + (var100)) + (var101)) + (var102)) + (var103)) + (var104)) + (var105)) + (var106)) + (var107)) + (var108)) + (var109)) + (var110)) + (var111)) + (var112)) + (var113)) + (var114)) + (var115)) + (var116)) + (var117)) + (var118)) + (var119)) + (var120)) + (var121)) + (var122)) + (var123)) + (var124)) + (var125)) + (var126)) + (var127)) + (var128)) + (var129)) + (var130)) + (var131)) + (var132)) + (var133)) + (var134)) + (var135)) + (var136)) + (var137)) + (var138)) + (var139)) + (var140)) + (var141)) + (var142)) + (var143)) + (var144)) + (var145)) + (var146)) + (var147)) + (var148)) + (var149)) + (var150)) + (var151)) + (var152)) + (var153)) + (var154)) + (var155)) + (var156)) + (var157)) + (var158)) + (var159)) + (var160)) + (var161)) + (var162)) + (var163)) + (var164)) + (var165)) + (var166)) + (var167)) + (var168)) + (var169)) + (var170)) + (var171)) + (var172)) + (var173)) + (var174)) + (var175)) + (var176)) + (var177)) + (var178)) + (var179)) + (var180)) + (var181)) + (var182)) + (var183)) + (var184)) + (var185)) + (var186)) + (var187)) + (var188)) + (var189)) + (var190)) + (var191)) + (var192)) + (var193)) + (var194)) + (var195)) + (var196)) + (var197)) + (var198)) + (var199)) + (var200)) + (var201)) + (var202)) + (var203)) + (var204)) + (var205)) + (var206)) + (var207)) + (var208)) + (var209)) + (var210)) + (var211)) + (var212)) + (var213)) + (var214)) + (var215)) + (var216)) + (var217)) + (var218)) + (var219)) + (var220)) + (var221)) + (var222)) + (var223)) + (var224)) + (var225)) + (var226)) + (var227)) + (var228)) + (var229)) + (var230)) + (var231)) + (var232)) + (var233)) + (var234)) + (var235)) + (var236)) + (var237)) + (var238)) + (var239)) + (var240)) + (var241)) + (var242)) + (var243)) + (var244)) + (var245)) + (var246)) + (var247)) + (var248)) + (var249)) + (var250)) + (var251)) + (var252)) + (var253)) + (var254)) + (var255)) + (var256)) + (var257)) + (var258)) + (var259)) + (var260)) + (var261)) + (var262)) + (var263)) + (var264)) + (var265)) + (var266)) + (var267)) + (var268)) + (var269)) + (var270)) + (var271)) + (var272)) + (var273)) + (var274)) + (var275)) + (var276)) + (var277)) + (var278)) + (var279)) + (var280)) + (var281)) + (var282)) + (var283)) + (var284)) + (var285)) + (var286)) + (var287)) + (var288)) + (var289)) + (var290)) + (var291)) + (var292)) + (var293)) + (var294)) + (var295)) + (var296)) + (var297)) + (var298)) + (var299)) + (var300)) + (var301)) + (var302)) + (var303)) + (var304)) + (var305)) + (var306)) + (var307)) + (var308)) + (var309)) + (var310)) + (var311)) + (var312)) + (var313)) + (var314)) + (var315)) + (var316)) + (var317)) + (var318)) + (var319)) + (var320)) + (var321)) + (var322)) + (var323)) + (var324)) + (var325)) + (var326)) + (var327)) + (var328)) + (var329)) + (var330)) + (var331)) + (var332)) + (var333)) + (var334)) + (var335)) + (var336)) + (var337)) + (var338)) + (var339)) + (var340)) + (var341)) + (var342)) + (var343)) + (var344)) + (var345)) + (var346)) + (var347)) + (var348)) + (var349)) + (var350)) + (var351)) + (var352)) + (var353)) + (var354)) + (var355)) + (var356)) + (var357)) + (var358)) + (var359)) + (var360)) + (var361)) + (var362)) + (var363)) + (var364)) + (var365)) + (var366)) + (var367)) + (var368)) + (var369)) + (var370)) + (var371)) + (var372)) + (var373)) + (var374)) + (var375)) + (var376)) + (var377)) + (var378)) + (var379)) + (var380)) + (var381)) + (var382)) + (var383)) + (var384)) + (var385)) + (var386)) + (var387)) + (var388)) + (var389)) + (var390)) + (var391)) + (var392)) + (var393)) + (var394)) + (var395)) + (var396)) + (var397)) + (var398)) + (var399)) + (var400)) + (var401)) + (var402)) + (var403)) + (var404)) + (var405)) + (var406)) + (var407)) + (var408)) + (var409)) + (var410)) + (var411)) + (var412)) + (var413)) + (var414)) + (var415)) + (var416)) + (var417)) + (var418)) + (var419)) + (var420)) + (var421)) + (var422)) + (var423)) + (var424)) + (var425)) + (var426)) + (var427)) + (var428)) + (var429)) + (var430)) + (var431)) + (var432)) + (var433)) + (var434)) + (var435)) + (var436)) + (var437)) + (var438)) + (var439)) + (var440)) + (var441)) + (var442)) + (var443)) + (var444)) + (var445)) + (var446)) + (var447)) + (var448)) + (var449)) + (var450)) + (var451)) + (var452)) + (var453)) + (var454)) + (var455)) + (var456)) + (var457)) + (var458)) + (var459)) + (var460)) + (var461)) + (var462)) + (var463)) + (var464)) + (var465)) + (var466)) + (var467)) + (var468)) + (var469)) + (var470)) + (var471)) + (var472)) + (var473)) + (var474)) + (var475)) + (var476)) + (var477)) + (var478)) + (var479)) + (var480)) + (var481)) + (var482)) + (var483)) + (var484)) + (var485)) + (var486)) + (var487)) + (var488)) + (var489)) + (var490)) + (var491)) + (var492)) + (var493)) + (var494)) + (var495)) + (var496)) + (var497)) + (var498)) + (var499)) + (var500)) + (var501)) + (var502)) + (var503)) + (var504)) + (var505)) + (var506)) + (var507)) + (var508)) + (var509)) + (var510)) + (var511)) + (var512)) + (var513)) + (var514)) + (var515)) + (var516)) + (var517)) + (var518)) + (var519)) + (var520)) + (var521)) + (var522)) + (var523)) + (var524)) + (var525)) + (var526)) + (var527)) + (var528)) + (var529)) + (var530)) + (var531)) + (var532)) + (var533)) + (var534)) + (var535)) + (var536)) + (var537)) + (var538)) + (var539)) + (var540)) + (var541)) + (var542)) + (var543)) + (var544)) + (var545)) + (var546)) + (var547)) + (var548)) + (var549)) + (var550)) + (var551)) + (var552)) + (var553)) + (var554)) + (var555)) + (var556)) + (var557)) + (var558)) + (var559)) + (var560)) + (var561)) + (var562)) + (var563)) + (var564)) + (var565)) + (var566)) + (var567)) + (var568)) + (var569)) + (var570)) + (var571)) + (var572)) + (var573)) + (var574)) + (var575)) + (var576)) + (var577)) + (var578)) + (var579)) + (var580)) + (var581)) + (var582)) + (var583)) + (var584)) + (var585)) + (var586)) + (var587)) + (var588)) + (var589)) + (var590)) + (var591)) + (var592)) + (var593)) + (var594)) + (var595)) + (var596)) + (var597)) + (var598)) + (var599)) + (var600)) + (var601)) + (var602)) + (var603)) + (var604)) + (var605)) + (var606)) + (var607)) + (var608)) + (var609)) + (var610)) + (var611)) + (var612)) + (var613)) + (var614)) + (var615)) + (var616)) + (var617)) + (var618)) + (var619)) + (var620)) + (var621)) + (var622)) + (var623)) + (var624)) + (var625)) + (var626)) + (var627)) + (var628)) + (var629)) + (var630)) + (var631)) + (var632)) + (var633)) + (var634)) + (var635)) + (var636)) + (var637)) + (var638)) + (var639)) + (var640)) + (var641)) + (var642)) + (var643)) + (var644)) + (var645)) + (var646)) + (var647)) + (var648)) + (var649)) + (var650)) + (var651)) + (var652)) + (var653)) + (var654)) + (var655)) + (var656)) + (var657)) + (var658)) + (var659)) + (var660)) + (var661)) + (var662)) + (var663)) + (var664)) + (var665)) + (var666)) + (var667)) + (var668)) + (var669)) + (var670)) + (var671)) + (var672)) + (var673)) + (var674)) + (var675)) + (var676)) + (var677)) + (var678)) + (var679)) + (var680)) + (var681)) + (var682)) + (var683)) + (var684)) + (var685)) + (var686)) + (var687)) + (var688)) + (var689)) + (var690)) + (var691)) + (var692)) + (var693)) + (var694)) + (var695)) + (var696)) + (var697)) + (var698)) + (var699)) + (var700)) + (var701)) + (var702)) + (var703)) + (var704)) + (var705)) + (var706)) + (var707)) + (var708)) + (var709)) + (var710)) + (var711)) + (var712)) + (var713)) + (var714)) + (var715)) + (var716)) + (var717)) + (var718)) + (var719)) + (var720)) + (var721)) + (var722)) + (var723)) + (var724)) + (var725)) + (var726)) + (var727)) + (var728)) + (var729)) + (var730)) + (var731)) + (var732)) + (var733)) + (var734)) + (var735)) + (var736)) + (var737)) + (var738)) + (var739)) + (var740)) + (var741)) + (var742)) + (var743)) + (var744)) + (var745)) + (var746)) + (var747)) + (var748)) + (var749)) + (var750)) + (var751)) + (var752)) + (var753)) + (var754)) + (var755)) + (var756)) + (var757)) + (var758)) + (var759)) + (var760)) + (var761)) + (var762)) + (var763)) + (var764)) + (var765)) + (var766)) + (var767)) + (var768)) + (var769)) + (var770)) + (var771)) + (var772)) + (var773)) + (var774)) + (var775)) + (var776)) + (var777)) + (var778)) + (var779)) + (var780)) + (var781)) + (var782)) + (var783)) + (var784)) + (var785)) + (var786)) + (var787)) + (var788)) + (var789)) + (var790)) + (var791)) + (var792)) + (var793)) + (var794)) + (var795)) + (var796)) + (var797)) + (var798)) + (var799)) + (var800)) + (var801)) + (var802)) + (var803)) + (var804)) + (var805)) + (var806)) + (var807)) + (var808)) + (var809)) + (var810)) + (var811)) + (var812)) + (var813)) + (var814)) + (var815)) + (var816)) + (var817)) + (var818)) + (var819)) + (var820)) + (var821)) + (var822)) + (var823)) + (var824)) + (var825)) + (var826)) + (var827)) + (var828)) + (var829)) + (var830)) + (var831)) + (var832)) + (var833)) + (var834)) + (var835)) + (var836)) + (var837)) + (var838)) + (var839)) + (var840)) + (var841)) + (var842)) + (var843)) + (var844)) + (var845)) + (var846)) + (var847)) + (var848)) + (var849)) + (var850)) + (var851)) + (var852)) + (var853)) + (var854)) + (var855)) + (var856)) + (var857)) + (var858)) + (var859)) + (var860)) + (var861)) + (var862)) + (var863)) + (var864)) + (var865)) + (var866)) + (var867)) + (var868)) + (var869)) + (var870)) + (var871)) + (var872)) + (var873)) + (var874)) + (var875)) + (var876)) + (var877)) + (var878)) + (var879)) + (var880)) + (var881)) + (var882)) + (var883)) + (var884)) + (var885)) + (var886)) + (var887)) + (var888)) + (var889)) + (var890)) + (var891)) + (var892)) + (var893)) + (var894)) + (var895)) + (var896)) + (var897)) + (var898)) + (var899)) + (var900)) + (var901)) + (var902)) + (var903)) + (var904)) + (var905)) + (var906)) + (var907)) + (var908)) + (var909)) + (var910)) + (var911)) + (var912)) + (var913)) + (var914)) + (var915)) + (var916)) + (var917)) + (var918)) + (var919)) + (var920)) + (var921)) + (var922)) + (var923)) + (var924)) + (var925)) + (var926)) + (var927)) + (var928)) + (var929)) + (var930)) + (var931)) + (var932)) + (var933)) + (var934)) + (var935)) + (var936)) + (var937)) + (var938)) + (var939)) + (var940)) + (var941)) + (var942)) + (var943)) + (var944)) + (var945)) + (var946)) + (var947)) + (var948)) + (var949)) + (var950)) + (var951)) + (var952)) + (var953)) + (var954)) + (var955)) + (var956)) + (var957)) + (var958)) + (var959)) + (var960)) + (var961)) + (var962)) + (var963)) + (var964)) + (var965)) + (var966)) + (var967)) + (var968)) + (var969)) + (var970)) + (var971)) + (var972)) + (var973)) + (var974)) + (var975)) + (var976)) + (var977)) + (var978)) + (var979)) + (var980)) + (var981)) + (var982)) + (var983)) + (var984)) + (var985)) + (var986)) + (var987)) + (var988)) + (var989)) + (var990)) + (var991)) + (var992)) + (var993)) + (var994)) + (var995)) + (var996)) + (var997)) + (var998)) + (var999));
It is working well now, thank you very much.
Glad to hear! 🎉 Thanks a lot for your feedback.
Question... why is the output needs to have so many parentheses in the final summation ?
Each operand of binary operator is wrapped into braces. This is done to handle complex operands, deal with operator precedence and negative values with unary minus sing. I guess this example can be helpful for better understanding why parentheses are needed: https://github.com/BayesWitnesses/m2cgen/blob/master/generated_code_examples/c/regression/linear.c
Problem with so many braces, is that for example when I try to compile the c-code into Qt, the editor complain about a maximum level of 256 braces ! I guess when the model is very big like the one I work on XGBoost with 3000 trees with 8 levels, size start to matters. I wrote some code in MATLAB to convert the C-code into some MATLAB code (See below), it still requires some minimum manual editing. Warning that here again MATLAB editor becomes very slow with very large files, so I am trying to split the 3000 trees code into 30x 100 trees. instead of individual variable, I rename them as indexed into a vector, easier for MATLAB to do the sum too.
Code (experimental) to process the c-code into MATLAB, hope it helps.
clc clear close all
tic
strFilenameIn = 'xgb_c_8_500.txt'; strFilenameOut = 'xgb_c_8_500.m';
fid = fopen(strFilenameIn, 'r'); f = fread(fid, '*char')'; fclose(fid);
%f = regexprep(f, ' ', ''); %f = regexprep(f, ' ', ' .'); % Better f = strrep(f, 'C code for the XGB structure: ', ''); f = strrep(f, 'double score(double * input) {', 'function output = xgb_c_8_500(input)'); f = strrep(f, 'double var0;', 'input = 100rand(80,1);'); %f = strrep(f, 'double var0;', 'input = 100zero(80,1);'); %f = strrep(f, 'double *', ''); % MATLAB no need to declare variables f = strrep(f, 'double var', '%double var'); %f = strrep(f, 'void', 'function'); f = strrep(f, '#', '%'); f = strrep(f, '{', ''); f = strrep(f, '}', 'end'); f = strrep(f, 'end else', 'else'); % Fix if starting f = strrep(f, 'if (', 'if('); f = strrep(f, 'memcpy', '%memcpy'); f = strrep(f, '(input', 'input'); % C needs [] MATLAB needs () f = strrep(f, ']', ')'); f = strrep(f, '[', '('); % Output variables f = strrep(f, '))', ')'); f = strrep(f, '>= (', '>= '); f = strrep(f, '(((', ''); %f = strrep(f, 'double * ', ''); f = strrep(f, 'return ', 'output = '); for k = 500:-1:1 % 500 down to 1, step of -1 %disp(['change input(' num2str(k-1) ' to input(' num2str(k) ')']) strSrc = ['input(' num2str(k-1) ')']; strDst = ['input(' num2str(k) ')']; f = strrep(f, strSrc, strDst); end % var0 = becomes var(0) % Need to change 0 ... N-1 (C/C++) to 1 ... N (MATLAB) f = strrep(f, 'var', 'var('); f = strrep(f, ' = ', ') = '); for k = 500:-1:1 % 500 down to 1, step of -1 %disp(['change var(' num2str(k-1) ' to var(' num2str(k) ')']) strSrc = ['var(' num2str(k-1) ')']; strDst = ['var(' num2str(k) ')']; f = strrep(f, strSrc, strDst); end
%f = strrep(f, 'input)', '(input)'); f = strrep(f, 'output)', 'output'); f = strrep(f, '(var(', 'var(');
disp('Done ...')
%f = strrep(f, ' ', ''); %f = strrep(f, ' ', ' .');
fid = fopen(strFilenameOut, 'w'); fprintf(fid, '%s', f); fclose(fid);
toc
I suggest to change the code editor then.