MatchZoo icon indicating copy to clipboard operation
MatchZoo copied to clipboard

Preprocessor.fit_transform does not initialise preprocessor.context

Open MathildaSu opened this issue 6 years ago • 1 comments

Describe the bug

When calling preprocessor = mz.preprocessors.DSSMPreprocessor() train_processed = preprocessor.fit_transform(train_pack) the preprocessor does not automatically initialise preprocessor.context like when calling

train_processed = preprocessor.fit(train_pack)

To Reproduce

import matchzoo as mz

import pandas as pd
path = "/results/DPH_3.res" #any file 
table = pd.read_csv(path,sep='\t')
df = pd.DataFrame({  #any format
        'text_left': table['q'],
        'text_right': table['doc'],
        'id_left': table['q_id'],
        'id_right': table['doc_id'],
        'label': table['label']
})

pack = mz.pack(df)

train_pack = pack[:10000]
valid_pack = pack[10000:15000]
predict_pack = pack[15000:20000]

preprocessor = mz.preprocessors.DSSMPreprocessor()
preprocessor.fit_transform(train_pack)
print(preprocessor.context) #output is {}

preprocessor.fit(train_pack)
print(preprocessor.context) #output is not empty, all params are initialised

train_processed = preprocessor.transform(train_pack)
valid_processed = preprocessor.transform(valid_pack)
predict_processed = preprocessor.transform(predict_pack)

Describe your attempts

  • [x] I checked the documentation and found no answer
  • [x] I checked to make sure that this is not a duplicate issue

Current workaround: Separately perform preprocessor.fit() and preprocessor.transform()

Context

  • OS : macOS 10.13
  • Hardware : CPU only
  • Matchzoo version : 2.1.0

MathildaSu avatar Jul 09 '19 10:07 MathildaSu

Since I don't have your data, I tested it with our toy data. I could not reproduce the bug you are reporting.

Here's the thing I tried:

import matchzoo as mz
pp = mz.preprocessors.DSSMPreprocessor()
dp = mz.datasets.toy.load_data()
pp.fit_transform(dp)
print(pp.context)  # actually prints correctly fitted context
pp.fit(dp)
print(pp.context)  # prints the same thing

uduse avatar Jul 10 '19 17:07 uduse