sherlock-project Wrong predictions while testing new data

I have trained a Sherlock model and it is performing well in test data. But, when I tried test the model and passing the data to model as per ' Sherlock-out-of-the-box' notebook, then it is giving wrong predictions ( even passing training data(in the same way) also results in wrong predictions). Any separate l approach need to be taken for testing the data ? Note : I have created my own paragraph vector w.r.to data I have and using that for training Sherlock model as well.

Apr 04 '22 14:04 SivaNagendra-sn

Hi @SivaNagendra-sn,

Thanks for reporting your problem here. Did you change the identifiers when initializing the model and making inferences with it? The "sherlock" identifiers in the respective parts of the notebook should be replaced with the identifier that you gave to the newly trained model.

Madelon

Apr 04 '22 17:04 madelonhulsebos

Hi @madelonhulsebos Thanks for the reply. I have replaced the paragraph vector file(.pkl) for extracting features and training Sherlock model. Identifiers means which are under 'feature column identifiers(.tsv files)''. If so, we have not changed anything in that .tsv files, Also can u elaborate on what needs to be changed there? If not, can u explain what actually those identifiers are ?

Apr 05 '22 03:04 SivaNagendra-sn

Hi @SivaNagendra-sn,

To use the model retrained with the new paragraph vector files, the model_id occurrences in the notebook ("sherlock" in the attached screen shot) should be replaced with the identifier of the new model:

Screenshot 2022-04-05 at 16 19 29

No changes should be made to the feature identifiers in the .tsv files. I hope that helps.

Apr 05 '22 14:04 madelonhulsebos

Yeah, I have actually done that. While training the Sherlock model I have mentioned the model_id as "retrained_sherlock". While working with predict function also we are mentioning model_id as "retrained_sherlock". For test data it is giving the results with good accuracy. But testing with new data( i.e, extracting features with 'extract_features' function then using predict function with model_id mentioned as retrained_sherlock also) the predictions were totally wrong ☹️.

Apr 05 '22 14:04 SivaNagendra-sn

I have retraining and prediction working on new data but if it's mostly text type of fields. For numeric fields and length of size 12 or more it does not work well.. prediction vector returned is null even though classification score and output for test data looks good . Do you have any suggetions? @madelonhulsebos

Apr 05 '22 18:04 iganand

OK, that should be alright then @SivaNagendra-sn. Is your training data formatted exactly as the original training data (as downloaded through the data download)? The feature extraction pipeline expects "stringified" lists. The input data may be wrong in your case as well, @iganand.

Apr 06 '22 19:04 madelonhulsebos

I am getting null in prediction vector. Although classification report for that specific field looks to F1 score .87. What might be the reason

Apr 07 '22 18:04 iganand

sherlock-project sherlock-project copied to clipboard

Wrong predictions while testing new data

sherlock-project
sherlock-project copied to clipboard