nilmtk-contrib icon indicating copy to clipboard operation
nilmtk-contrib copied to clipboard

API + UK-DALE

Open klemenjak opened this issue 5 years ago • 24 comments

Hi,

I tried to use the dataset UK-DALE in some experiments. Could it be that there are some problems with that particular dataset?

This is the error message I get for DAE:

Using TensorFlow backend.
Started training for  DAE
Joint training for  DAE
............... Loading Data for training ...................
Loading data for  UK-DALE  dataset
Loading building ...  1
Dropping missing values
Train Jointly
(4812480, 1) (4812480, 1) MultiIndex([('power', 'active')],
           names=['physical_quantity', 'type']) MultiIndex([('power', 'active')],
           names=['physical_quantity', 'type'])
Doing Preprocessing
Traceback (most recent call last):
  File "ukstudies.py", line 275, in <module>
    api_results = API(experiments[experiment_name])
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 59, in __init__
    self.experiment(params)
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 104, in experiment
    self.train_jointly(clf,d)            
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 257, in train_jointly
    clf.partial_fit(self.train_mains,self.train_submeters)
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk_contrib/dae.py", line 61, in partial_fit
    app_df = pd.concat(app_df,axis=0).values
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 258, in concat
    return op.get_result()
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 473, in get_result
    mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 2044, in concatenate_block_managers
    values = values.copy()
MemoryError: Unable to allocate array with shape (99, 4812391) and data type float64
Closing remaining open files:/home/users/chklemen/ukdale.h5...done

I use the latest public version of UK-DALE.

Thanks!

klemenjak avatar Dec 12 '19 16:12 klemenjak

Hello @klemenjak, the experiments we used were having data for 2 months. I think the experiment you are trying to do, needs more GPU memory. The experiments we conducted were using 8GB GPU's. If you reduce the timeframe of the algorithm, I think you will be able to produce the results.

Rithwikksvr avatar Dec 12 '19 16:12 Rithwikksvr

Hi,

I see. So it's related to memory limits and not a bug in the software? Okay that's another topic then. However, here is the experiment I was planning to run:

uk_1 = {
    'power': {
        'mains': ['active'],
        'appliance': ['active']
    },
    'sample_rate': 10,

    'appliances': ['fridge', 'dish washer', 'kettle', 'microwave', 'washing machine'],

    'methods': {
        'FHMMExact': FHMMExact({}),
        'DAE': DAE({'n_epochs': 30, 'batch_size': 1024})
    },
    'train': {
        'datasets': {
            'UK-DALE': {
                'path': '{}ukdale.h5'.format(ddir),
                'buildings': {
                    1: {
                        'start_time': '2013-04-12',
                        'end_time': '2014-10-21'
                    }
                }
            }
        }
    },
    'test': {
        'datasets': {
            'UK-DALE': {
                'path': '{}ukdale.h5'.format(ddir),
                'buildings': {
                    1: {
                        'start_time': '2014-10-22',
                        'end_time': '2014-12-15'
                    }
                }
            }
        },
        'metrics': ['mae', 'f1score']
    }
}

klemenjak avatar Dec 12 '19 16:12 klemenjak

I see, the experiment was for a duration of 6 months. That explains why it is unable to run. Can you try algorithms such as Mean?(It doesn't depend much on memory)

Rithwikksvr avatar Dec 12 '19 16:12 Rithwikksvr

I am testing what NILMTK algorithms can be used for such a big duration at the moment. Will provide feedback on that. Do you see any chance for me to fix this problem so that I can use DAE?

Btw, I haven't had such issues on REFIT for the same durations.

klemenjak avatar Dec 12 '19 16:12 klemenjak

@klemenjak Give me sometime. I will get back to you with a code snippet. Which should help you train models of a larger timeframe, but the training process will be much slower. I'll explain in brief how it works

Assume the data can be made into k parts:


for i in 1 to n_epochs:
    for j in 1 to k:
         Batch fit a model on the j th chunk of data.
         # It means simply fit for one epoch on the above data

This is already a part of NILMTK-contrib!

Rithwikksvr avatar Dec 12 '19 16:12 Rithwikksvr

Thanks! You guys are amazing. Are you talking about that train_chunk_wise function? I see that you can set a chunk_size. Question 1: What unit is that? seconds or number of samples? Question 2: As NILMTK expert, what chunk_size would you recommend?

klemenjak avatar Dec 12 '19 16:12 klemenjak

Exactly, that is the function!

Q1: The number of samples Q2: I am not an expert on NILM. But, I think a chunk size that covers a month should do the trick.

Rithwikksvr avatar Dec 12 '19 16:12 Rithwikksvr

@nipunbatra What do you think should be the ideal chunk_size parameter for training a model?

Rithwikksvr avatar Dec 12 '19 16:12 Rithwikksvr

Hi,

good news. I was able to enter chunk wise training by setting a chunk size and enabling it in the DAE line.

uk_1 = {
    'power': {
        'mains': ['active'],
        'appliance': ['active']
    },
    'sample_rate': 10,
    'chunk_size': 2592,

    'appliances': ['fridge', 'dish washer', 'kettle', 'microwave', 'washing machine'],

    'methods': {
        'DAE': DAE({'n_epochs': 30, 'batch_size': 1024, 'chunk_wise_training':True})
    },

Unfortunately, I ran into the next issue:


....
.....
=================================================================
Total params: 831,801
Trainable params: 831,801
Non-trainable params: 0
_________________________________________________________________
None
Started Retraining model for  microwave
Train on 2188 samples, validate on 387 samples
Epoch 1/1
2188/2188 [==============================] - 0s 190us/step - loss: 1.7490e-06 - val_loss: 1.3154e-06

Epoch 00001: val_loss improved from inf to 0.00000, saving model to dae-temp-weights-63250.h5
First model training for  washing machine
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d_9 (Conv1D)            (None, 99, 8)             40        
_________________________________________________________________
flatten_5 (Flatten)          (None, 792)               0         
_________________________________________________________________
dense_13 (Dense)             (None, 792)               628056    
_________________________________________________________________
dense_14 (Dense)             (None, 128)               101504    
_________________________________________________________________
dense_15 (Dense)             (None, 792)               102168    
_________________________________________________________________
reshape_5 (Reshape)          (None, 99, 8)             0         
_________________________________________________________________
conv1d_10 (Conv1D)           (None, 99, 1)             33        
=================================================================
Total params: 831,801
Trainable params: 831,801
Non-trainable params: 0
_________________________________________________________________
None
Started Retraining model for  washing machine
Train on 2188 samples, validate on 387 samples
Epoch 1/1
2188/2188 [==============================] - 0s 212us/step - loss: 2.9596e-07 - val_loss: 2.1699e-07

Epoch 00001: val_loss improved from inf to 0.00000, saving model to dae-temp-weights-75771.h5
Starting enumeration..........
Dropping missing values
Doing Preprocessing
Started Retraining model for  fridge
Traceback (most recent call last):
  File "ukstudies.py", line 276, in <module>
    api_results = API(experiments[experiment_name])
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 59, in __init__
    self.experiment(params)
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 95, in experiment
    self.train_chunk_wise(clf,d) 
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 166, in train_chunk_wise
    clf.partial_fit(self.train_mains,self.train_submeters)
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk_contrib/dae.py", line 74, in partial_fit
    train_x,v_x,train_y,v_y = train_test_split(train_main,power,test_size=.15,random_state=10)  
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2100, in train_test_split
    default_test_size=0.25)
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1782, in _validate_shuffle_split
    train_size)
ValueError: With n_samples=1, test_size=0.15 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
Closing remaining open files:/home/users/chklemen/ukdale.h5...done

Have you seen that before?

klemenjak avatar Dec 12 '19 17:12 klemenjak

This is just my thought. I am not sure what chunk_size you mean. From the previous pseudo code, I guess it means the input size of the network. If this was the case, choosing a chunk size is a research question as it is a hyperparameter of the model. If the chunk size was the training data fed into the model, bigger is better because it is more representative for the distribution. If the whole training data is out of memory, we need a pipeline to read tractable chunks on fly into the memory so that the algorithm can go through all the data.

MingjunZhong avatar Dec 12 '19 17:12 MingjunZhong

@klemenjak Give me some time to reproduce the issue. I will get back to you.

Rithwikksvr avatar Dec 12 '19 17:12 Rithwikksvr

@MingjunZhong chunk_size means the number of sample to be fed in to the network. You are right, it is indeed a hyper-parameter. But, what should be an ideal number for training models such as neural networks?

In the case of neural networks, load all the data at once or loading data chunk-wise doesn't make difference. The algorithms and the API were developed that way.

Chunk-wise training algorithm

for i in 1 to n_epochs:
    for j in 1 to k:
         Batch fit a model on the j th chunk of data.
         # It means simply fit for one epoch on the above data

Normal Neural network training algorithm


for i in 1 to n_epochs:
    train for an epoch on all data

Rithwikksvr avatar Dec 12 '19 17:12 Rithwikksvr

This is just my thought. I am not sure what chunk_size you mean. From the previous pseudo code, I guess it means the input size of the network. If this was the case, choosing a chunk size is a research question as it is a hyperparameter of the model. If the chunk size was the training data fed into the model, bigger is better because it is more representative for the distribution. If the whole training data is out of memory, we need a pipeline to read tractable chunks on fly into the memory so that the algorithm can go through all the data.

OK, maybe I have written that comment too fast. Here is what I mean. Let's have a look at line 60 of the api.py:


self.chunk_size = params.get('chunk_size',self.chunk_size)

From this follows that users can set a chunk_size for every experiment, just like I did. For instance:

experiment_for_API = {
    'power': {
        'mains': ['active'],
        'appliance': ['active']
    },
    'sample_rate': 10,
    'chunk_size': 2592,

    'appliances': ['fridge', 'dish washer', 'kettle', 'microwave', 'washing machine'],

    'methods': {
        'DAE': DAE({'n_epochs': 30, 'batch_size': 1024, 'chunk_wise_training':True})
    ...
....}

api_res = API(experiment_for_API)

If you scroll down to line 85, you can see that if that chunk_size is defined, the API will do chunk wise training. I was wondering what value for this chunk_size you as developers of NILMTK would recommend.

klemenjak avatar Dec 12 '19 17:12 klemenjak

Yes, the same chunk_size will be used for all the algorithms in the experiment. I am not an expert to speak about the idea value for the chunk_size.

I request @MingjunZhong @nipunbatra to discuss more about choosing an optimal value for chunk_size

Rithwikksvr avatar Dec 12 '19 17:12 Rithwikksvr

OK then chunk size is a hyperparameter. The best is to do experiments to choose this number and cross validation could be the one. Heuristically I would assume the chunk should at least cover the duration when the appliance was using. For example kettle could be 5 minutes and I would try 10 minutes for example; dish washer could last more than 140 minutes and so chunk size should be larger.

MingjunZhong avatar Dec 12 '19 17:12 MingjunZhong

@Rithwikksvr you were right about the memory. As soon as I change the sampling interval from 10s to 60s, the experiment runs without any problems.

klemenjak avatar Dec 12 '19 18:12 klemenjak

@klemenjak Also, remember that sample_period simply sample's the data at the specified value. It doesn't downsample it.

Are you still facing the issue?

Rithwikksvr avatar Dec 12 '19 18:12 Rithwikksvr

Thanks for asking @Rithwikksvr well, it depends.

That chunk_wise_training doesn't work at all for me, even for a sampling interval of 60s.

uk_1 = {
    'power': {
        'mains': ['active'],
        'appliance': ['active']
    },
    'sample_rate': 60,
    'chunk_size': 360000,

    'appliances': ['fridge', 'dish washer', 'kettle', 'microwave', 'washing machine'],
    'methods': {
        'DAE': DAE({'n_epochs': 10, 'batch_size': 1024, 'chunk_wise_training':True}),
    },

The default train_jointly training works, if I change the sample interval from 10s to 30s. I guess I will face this trade-off and continue with 30s.

klemenjak avatar Dec 12 '19 21:12 klemenjak

Hello @klemenjak, you need to change one line of code in dae.py.

Do the following changes From self.chunk_wise_training = params.get('chunk_wise_training',False)

To self.chunk_wise_training = params.get('chunk_wise_training',True)

Even the testing will be done on a chunk by chunk basis.

I'll make the changes public, once I confirm that it is bug free.

Rithwikksvr avatar Dec 13 '19 16:12 Rithwikksvr

Hi,

thanks for your message. I already did that in the definition of the experiment:

    'methods': {
        'DAE': DAE({'n_epochs': 30, 'batch_size': 1024, 'chunk_wise_training':True})
    },

I really appreciate your efforts. I should tell you, however, that I have given up experiments on UK-DALE and moved on to REFIT. Anyway, I think other's will encounter this issue as well.

klemenjak avatar Dec 13 '19 16:12 klemenjak

@Rithwikksvr Hi, sir I have a question about chunk_size. What is the purpose of using chunk_size? is that for parallel training?

Hessen525 avatar Dec 15 '19 14:12 Hessen525

Hi @Hessen525 ,

Sometimes the dataset can be really large. It might not fit into the RAM. Then we can't train the model. So, we need to load a subset of samples from the original set and train on them, and load another subset and train and this process goes on.

Let's say chunk_size=1000. Then the algorithm loads the first 1000 samples, trains on them, then loads the next 1000 samples and trains on them and so on, till it finishes reading the whole dataset. It can be used when the dataset is too big to train as a whole.

No need to call me sir :)

I hope this clears your doubt

Rithwikksvr avatar Dec 15 '19 14:12 Rithwikksvr

Hi,

I tried to use the dataset UK-DALE in some experiments. Could it be that there are some problems with that particular dataset?

This is the error message I get for DAE:

Using TensorFlow backend.
Started training for  DAE
Joint training for  DAE
............... Loading Data for training ...................
Loading data for  UK-DALE  dataset
Loading building ...  1
Dropping missing values
Train Jointly
(4812480, 1) (4812480, 1) MultiIndex([('power', 'active')],
           names=['physical_quantity', 'type']) MultiIndex([('power', 'active')],
           names=['physical_quantity', 'type'])
Doing Preprocessing
Traceback (most recent call last):
  File "ukstudies.py", line 275, in <module>
    api_results = API(experiments[experiment_name])
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 59, in __init__
    self.experiment(params)
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 104, in experiment
    self.train_jointly(clf,d)            
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 257, in train_jointly
    clf.partial_fit(self.train_mains,self.train_submeters)
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk_contrib/dae.py", line 61, in partial_fit
    app_df = pd.concat(app_df,axis=0).values
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 258, in concat
    return op.get_result()
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 473, in get_result
    mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
  File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 2044, in concatenate_block_managers
    values = values.copy()
MemoryError: Unable to allocate array with shape (99, 4812391) and data type float64
Closing remaining open files:/home/users/chklemen/ukdale.h5...done

I use the latest public version of UK-DALE.

Thanks!

WKKO avatar Apr 06 '22 12:04 WKKO

Sorry to bother you. Did you encounter this error when running the code image

WKKO avatar Apr 06 '22 12:04 WKKO