BARS Avazu_x4 weirdly requires an extremely large amount of video memory.

I directly downloaded the preprocessed Avazu_x4 dataset, then discarded the id feature, and processed all other features as string categorical features. However, it is very strange that it needs a huge amount of GPU memory to be loaded (about 31G !!!!), and I can’t even use this dataset on Nvidia V100 32G because of OOM. Is this normal? Is there any way to fix this?

Jun 07 '23 17:06 Isuxiz

We run the benchmark experiment on Avazu_x4 on 16G GPU. Can you provide more details? Which experiment steps do you follow?

Jun 08 '23 00:06 xpai

I use DeepFM implemented by FuxiCTR(2.0+), dataset is downloaded from link in this section and don't do any preprocess(nor did the preprocessing of x4_001 or x4_002 below).

my setting: dataset_config.yaml

Avazu:
  data_format: csv
  data_root: ../../../data/
  feature_cols:
    [
      {active: False, dtype: str, name: ["id"], type: categorical},
      {
        active: True,
        dtype: str,
        name:
          [
            "C1",
            "hour",
            "banner_pos",
            "site_id",
            "site_domain",
            "site_category",
            "app_id",
            "app_domain",
            "app_category",
            "device_id",
            "device_ip",
            "device_model",
            "device_type",
            "device_conn_type",
            "C14",
            "C15",
            "C16",
            "C17",
            "C18",
            "C19",
            "C20",
            "C21",
          ],
        type: categorical,
      },
    ]
  label_col: { dtype: int, name: "click" }
  min_categr_count: 1
  test_data: ../../../data/Avazu/test.csv
  train_data: ../../../data/Avazu/train.csv
  valid_data: ../../../data/Avazu/valid.csv

model_config.yaml

DeepFM_Avazu:
  batch_norm: True
  batch_size: 4096
  dataset_id: Avazu
  early_stop_patience: 2
  embedding_dim: 32
  embedding_regularizer: 0.01
  epochs: 100
  hidden_activations: relu
  hidden_units: [400, 400, 400]
  learning_rate: 1.e-3
  loss: "binary_crossentropy"
  metrics: ["logloss", "AUC"]
  model: DeepFM
  monitor: AUC
  monitor_mode: max
  net_dropout: 0.3
  net_regularizer: 0
  optimizer: adam
  seed: 2023
  shuffle: True
  task: binary_classification
  verbose: 1

Another information that may be useful is that I found that the GPU memory usage are not significantly influenced by batch_size . I have tried training with batch_size = 2, but OOM errors still occur.

Jun 08 '23 13:06 Isuxiz

I figured out a way to avoid this problem. After trying I found out that this OOM is caused by setting the dtype to str, so I changed dataset_config.yaml:

Avazu:
  data_format: csv
  data_root: ../../data/
  feature_cols:
    [
      {
        active: False,
        dtype: str,
        name: ["id"],
        type: categorical,
      },
      {
        active: True,
        dtype: str,
        name:
          [
            "site_id",
            "site_domain",
            "site_category",
            "app_id",
            "app_domain",
            "app_category",
            "device_id",
            "device_ip",
            "device_model",
          ],
        type: categorical,
      },
      {
        active: True,
        dtype: int,
        name: [    
            "hour",
            "C1",
            "banner_pos",
            "device_type",
            "device_conn_type",
            "C14",
            "C15",
            "C16",
            "C17",
            "C18",
            "C19",
            "C20",
            "C21",
        ],
        type: categorical,
      },
    ]
  label_col: { dtype: int, name: "click" }
  min_categr_count: 1
  test_data: ../../data/Avazu/test.csv
  train_data: ../../data/Avazu/train.csv
  valid_data: ../../data/Avazu/valid.csv

But under this setting, fuxi_ctr seems to have a bug, it will report an error:

Traceback (most recent call last):
  File "xxx/FuxiCTR/model_zoo/MY_MODEL/run_expid.py", line 65, in <module>
    params["train_data"], params["valid_data"], params["test_data"] = build_dataset(
  File "xxx/FuxiCTR/fuxictr/preprocess/build_dataset.py", line 104, in build_dataset
    feature_encoder.fit(train_ddf, **kwargs)
  File "xxx/FuxiCTR/fuxictr/preprocess/feature_processor.py", line 139, in fit
    self.save_vocab(self.vocab_file)
  File "xxx/FuxiCTR/fuxictr/preprocess/feature_processor.py", line 334, in save_vocab
    fd.write(json.dumps(vocab, indent=4))
  File "xxx/anaconda3/envs/py39/lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "xxx/anaconda3/envs/py39/lib/python3.9/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "xxx/anaconda3/envs/py39/lib/python3.9/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "xxx/anaconda3/envs/py39/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "xxx/anaconda3/envs/py39/lib/python3.9/json/encoder.py", line 376, in _iterencode_dict
    raise TypeError(f'keys must be str, int, float, bool or None, '
TypeError: keys must be str, int, float, bool or None, not int64

It looks like it doesn't convert numpy's integer type to python's built-in integer type correctly, I fixed this problem by modifying the source code:

  # in file FuxiCTR/fuxictr/preprocess/feature_processor.py
  def save_vocab(self, vocab_file):
      logging.info("Save feature_vocab to json: " + vocab_file)
      vocab = dict()
      for feature, spec in self.feature_map.features.items():
          if spec["type"] in ["categorical", "sequence"]:
              vocab[feature] = OrderedDict(
                  sorted(self.processor_dict[feature + "::tokenizer"].vocab.items(), key=lambda x:x[1]))
              

      print("before:",[str(k)+": "+str(set(str(type(kk)) for kk in vocab[k])) for k in vocab])
      for sub_dict in vocab.values():
          for k in list(sub_dict.keys()):
              if isinstance(k, (np.int8, np.int16, np.int32, np.int64)):
                  sub_dict[int(k)] = sub_dict[k]
                  del sub_dict[k]
      print("after:",[str(k)+": "+str(set(str(type(kk)) for kk in vocab[k])) for k in vocab])

      with open(vocab_file, "w") as fd:
          fd.write(json.dumps(vocab, indent=4))

Please consider to fix it officially in the next update.

Now, GPU memory usage is satisfactory:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:32:00.0 Off |                    0 |
| N/A   46C    P0    68W / 300W |   5188MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     55632      C   python                           5159MiB |
+-----------------------------------------------------------------------------+

Jun 10 '23 17:06 Isuxiz

Close after fixed.

Jun 08 '24 02:06 zhujiem