nilmtk-contrib
nilmtk-contrib copied to clipboard
API + UK-DALE
Hi,
I tried to use the dataset UK-DALE in some experiments. Could it be that there are some problems with that particular dataset?
This is the error message I get for DAE:
Using TensorFlow backend.
Started training for DAE
Joint training for DAE
............... Loading Data for training ...................
Loading data for UK-DALE dataset
Loading building ... 1
Dropping missing values
Train Jointly
(4812480, 1) (4812480, 1) MultiIndex([('power', 'active')],
names=['physical_quantity', 'type']) MultiIndex([('power', 'active')],
names=['physical_quantity', 'type'])
Doing Preprocessing
Traceback (most recent call last):
File "ukstudies.py", line 275, in <module>
api_results = API(experiments[experiment_name])
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 59, in __init__
self.experiment(params)
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 104, in experiment
self.train_jointly(clf,d)
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 257, in train_jointly
clf.partial_fit(self.train_mains,self.train_submeters)
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk_contrib/dae.py", line 61, in partial_fit
app_df = pd.concat(app_df,axis=0).values
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 258, in concat
return op.get_result()
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 473, in get_result
mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 2044, in concatenate_block_managers
values = values.copy()
MemoryError: Unable to allocate array with shape (99, 4812391) and data type float64
Closing remaining open files:/home/users/chklemen/ukdale.h5...done
I use the latest public version of UK-DALE.
Thanks!
Hello @klemenjak, the experiments we used were having data for 2 months. I think the experiment you are trying to do, needs more GPU memory. The experiments we conducted were using 8GB GPU's. If you reduce the timeframe of the algorithm, I think you will be able to produce the results.
Hi,
I see. So it's related to memory limits and not a bug in the software? Okay that's another topic then. However, here is the experiment I was planning to run:
uk_1 = {
'power': {
'mains': ['active'],
'appliance': ['active']
},
'sample_rate': 10,
'appliances': ['fridge', 'dish washer', 'kettle', 'microwave', 'washing machine'],
'methods': {
'FHMMExact': FHMMExact({}),
'DAE': DAE({'n_epochs': 30, 'batch_size': 1024})
},
'train': {
'datasets': {
'UK-DALE': {
'path': '{}ukdale.h5'.format(ddir),
'buildings': {
1: {
'start_time': '2013-04-12',
'end_time': '2014-10-21'
}
}
}
}
},
'test': {
'datasets': {
'UK-DALE': {
'path': '{}ukdale.h5'.format(ddir),
'buildings': {
1: {
'start_time': '2014-10-22',
'end_time': '2014-12-15'
}
}
}
},
'metrics': ['mae', 'f1score']
}
}
I see, the experiment was for a duration of 6 months. That explains why it is unable to run. Can you try algorithms such as Mean?(It doesn't depend much on memory)
I am testing what NILMTK algorithms can be used for such a big duration at the moment. Will provide feedback on that. Do you see any chance for me to fix this problem so that I can use DAE?
Btw, I haven't had such issues on REFIT for the same durations.
@klemenjak Give me sometime. I will get back to you with a code snippet. Which should help you train models of a larger timeframe, but the training process will be much slower. I'll explain in brief how it works
Assume the data can be made into k parts:
for i in 1 to n_epochs:
for j in 1 to k:
Batch fit a model on the j th chunk of data.
# It means simply fit for one epoch on the above data
This is already a part of NILMTK-contrib!
Thanks! You guys are amazing. Are you talking about that train_chunk_wise function? I see that you can set a chunk_size. Question 1: What unit is that? seconds or number of samples? Question 2: As NILMTK expert, what chunk_size would you recommend?
Exactly, that is the function!
Q1: The number of samples Q2: I am not an expert on NILM. But, I think a chunk size that covers a month should do the trick.
@nipunbatra What do you think should be the ideal chunk_size parameter for training a model?
Hi,
good news. I was able to enter chunk wise training by setting a chunk size and enabling it in the DAE line.
uk_1 = {
'power': {
'mains': ['active'],
'appliance': ['active']
},
'sample_rate': 10,
'chunk_size': 2592,
'appliances': ['fridge', 'dish washer', 'kettle', 'microwave', 'washing machine'],
'methods': {
'DAE': DAE({'n_epochs': 30, 'batch_size': 1024, 'chunk_wise_training':True})
},
Unfortunately, I ran into the next issue:
....
.....
=================================================================
Total params: 831,801
Trainable params: 831,801
Non-trainable params: 0
_________________________________________________________________
None
Started Retraining model for microwave
Train on 2188 samples, validate on 387 samples
Epoch 1/1
2188/2188 [==============================] - 0s 190us/step - loss: 1.7490e-06 - val_loss: 1.3154e-06
Epoch 00001: val_loss improved from inf to 0.00000, saving model to dae-temp-weights-63250.h5
First model training for washing machine
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_9 (Conv1D) (None, 99, 8) 40
_________________________________________________________________
flatten_5 (Flatten) (None, 792) 0
_________________________________________________________________
dense_13 (Dense) (None, 792) 628056
_________________________________________________________________
dense_14 (Dense) (None, 128) 101504
_________________________________________________________________
dense_15 (Dense) (None, 792) 102168
_________________________________________________________________
reshape_5 (Reshape) (None, 99, 8) 0
_________________________________________________________________
conv1d_10 (Conv1D) (None, 99, 1) 33
=================================================================
Total params: 831,801
Trainable params: 831,801
Non-trainable params: 0
_________________________________________________________________
None
Started Retraining model for washing machine
Train on 2188 samples, validate on 387 samples
Epoch 1/1
2188/2188 [==============================] - 0s 212us/step - loss: 2.9596e-07 - val_loss: 2.1699e-07
Epoch 00001: val_loss improved from inf to 0.00000, saving model to dae-temp-weights-75771.h5
Starting enumeration..........
Dropping missing values
Doing Preprocessing
Started Retraining model for fridge
Traceback (most recent call last):
File "ukstudies.py", line 276, in <module>
api_results = API(experiments[experiment_name])
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 59, in __init__
self.experiment(params)
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 95, in experiment
self.train_chunk_wise(clf,d)
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 166, in train_chunk_wise
clf.partial_fit(self.train_mains,self.train_submeters)
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk_contrib/dae.py", line 74, in partial_fit
train_x,v_x,train_y,v_y = train_test_split(train_main,power,test_size=.15,random_state=10)
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2100, in train_test_split
default_test_size=0.25)
File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1782, in _validate_shuffle_split
train_size)
ValueError: With n_samples=1, test_size=0.15 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
Closing remaining open files:/home/users/chklemen/ukdale.h5...done
Have you seen that before?
This is just my thought. I am not sure what chunk_size you mean. From the previous pseudo code, I guess it means the input size of the network. If this was the case, choosing a chunk size is a research question as it is a hyperparameter of the model. If the chunk size was the training data fed into the model, bigger is better because it is more representative for the distribution. If the whole training data is out of memory, we need a pipeline to read tractable chunks on fly into the memory so that the algorithm can go through all the data.
@klemenjak Give me some time to reproduce the issue. I will get back to you.
@MingjunZhong chunk_size means the number of sample to be fed in to the network. You are right, it is indeed a hyper-parameter. But, what should be an ideal number for training models such as neural networks?
In the case of neural networks, load all the data at once or loading data chunk-wise doesn't make difference. The algorithms and the API were developed that way.
Chunk-wise training algorithm
for i in 1 to n_epochs:
for j in 1 to k:
Batch fit a model on the j th chunk of data.
# It means simply fit for one epoch on the above data
Normal Neural network training algorithm
for i in 1 to n_epochs:
train for an epoch on all data
This is just my thought. I am not sure what chunk_size you mean. From the previous pseudo code, I guess it means the input size of the network. If this was the case, choosing a chunk size is a research question as it is a hyperparameter of the model. If the chunk size was the training data fed into the model, bigger is better because it is more representative for the distribution. If the whole training data is out of memory, we need a pipeline to read tractable chunks on fly into the memory so that the algorithm can go through all the data.
OK, maybe I have written that comment too fast. Here is what I mean. Let's have a look at line 60 of the api.py:
self.chunk_size = params.get('chunk_size',self.chunk_size)
From this follows that users can set a chunk_size for every experiment, just like I did. For instance:
experiment_for_API = {
'power': {
'mains': ['active'],
'appliance': ['active']
},
'sample_rate': 10,
'chunk_size': 2592,
'appliances': ['fridge', 'dish washer', 'kettle', 'microwave', 'washing machine'],
'methods': {
'DAE': DAE({'n_epochs': 30, 'batch_size': 1024, 'chunk_wise_training':True})
...
....}
api_res = API(experiment_for_API)
If you scroll down to line 85, you can see that if that chunk_size is defined, the API will do chunk wise training. I was wondering what value for this chunk_size you as developers of NILMTK would recommend.
Yes, the same chunk_size
will be used for all the algorithms in the experiment. I am not an expert to speak about the idea value for the chunk_size.
I request @MingjunZhong @nipunbatra to discuss more about choosing an optimal value for chunk_size
OK then chunk size is a hyperparameter. The best is to do experiments to choose this number and cross validation could be the one. Heuristically I would assume the chunk should at least cover the duration when the appliance was using. For example kettle could be 5 minutes and I would try 10 minutes for example; dish washer could last more than 140 minutes and so chunk size should be larger.
@Rithwikksvr you were right about the memory. As soon as I change the sampling interval from 10s to 60s, the experiment runs without any problems.
@klemenjak Also, remember that sample_period simply sample's the data at the specified value. It doesn't downsample it.
Are you still facing the issue?
Thanks for asking @Rithwikksvr well, it depends.
That chunk_wise_training doesn't work at all for me, even for a sampling interval of 60s.
uk_1 = {
'power': {
'mains': ['active'],
'appliance': ['active']
},
'sample_rate': 60,
'chunk_size': 360000,
'appliances': ['fridge', 'dish washer', 'kettle', 'microwave', 'washing machine'],
'methods': {
'DAE': DAE({'n_epochs': 10, 'batch_size': 1024, 'chunk_wise_training':True}),
},
The default train_jointly training works, if I change the sample interval from 10s to 30s. I guess I will face this trade-off and continue with 30s.
Hello @klemenjak, you need to change one line of code in dae.py.
Do the following changes
From
self.chunk_wise_training = params.get('chunk_wise_training',False)
To
self.chunk_wise_training = params.get('chunk_wise_training',True)
Even the testing will be done on a chunk by chunk basis.
I'll make the changes public, once I confirm that it is bug free.
Hi,
thanks for your message. I already did that in the definition of the experiment:
'methods': {
'DAE': DAE({'n_epochs': 30, 'batch_size': 1024, 'chunk_wise_training':True})
},
I really appreciate your efforts. I should tell you, however, that I have given up experiments on UK-DALE and moved on to REFIT. Anyway, I think other's will encounter this issue as well.
@Rithwikksvr Hi, sir I have a question about chunk_size. What is the purpose of using chunk_size? is that for parallel training?
Hi @Hessen525 ,
Sometimes the dataset can be really large. It might not fit into the RAM. Then we can't train the model. So, we need to load a subset of samples from the original set and train on them, and load another subset and train and this process goes on.
Let's say chunk_size=1000. Then the algorithm loads the first 1000 samples, trains on them, then loads the next 1000 samples and trains on them and so on, till it finishes reading the whole dataset. It can be used when the dataset is too big to train as a whole.
No need to call me sir :)
I hope this clears your doubt
Hi,
I tried to use the dataset UK-DALE in some experiments. Could it be that there are some problems with that particular dataset?
This is the error message I get for DAE:
Using TensorFlow backend. Started training for DAE Joint training for DAE ............... Loading Data for training ................... Loading data for UK-DALE dataset Loading building ... 1 Dropping missing values Train Jointly (4812480, 1) (4812480, 1) MultiIndex([('power', 'active')], names=['physical_quantity', 'type']) MultiIndex([('power', 'active')], names=['physical_quantity', 'type']) Doing Preprocessing Traceback (most recent call last): File "ukstudies.py", line 275, in <module> api_results = API(experiments[experiment_name]) File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 59, in __init__ self.experiment(params) File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 104, in experiment self.train_jointly(clf,d) File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk/api.py", line 257, in train_jointly clf.partial_fit(self.train_mains,self.train_submeters) File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/nilmtk_contrib/dae.py", line 61, in partial_fit app_df = pd.concat(app_df,axis=0).values File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 258, in concat return op.get_result() File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 473, in get_result mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy File "/home/users/chklemen/anaconda3/envs/mirum/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 2044, in concatenate_block_managers values = values.copy() MemoryError: Unable to allocate array with shape (99, 4812391) and data type float64 Closing remaining open files:/home/users/chklemen/ukdale.h5...done
I use the latest public version of UK-DALE.
Thanks!
Sorry to bother you. Did you encounter this error when running the code