RTSGAN icon indicating copy to clipboard operation
RTSGAN copied to clipboard

About variable length time series but without missing data generation

Open enthusiasmYuan opened this issue 3 years ago • 8 comments

Hello, I understand that your code works for fixed time series length datasets and datasets with missing data. Does this code work for datasets with variable length time series but without missing data? (For example, the first time series in the dataset is of length 70 and the second time series is of length 80 ...... , but But relatively speaking the first time series has only the next 10 data as null values and contains no other missing data) I have tried in both your RTSGAN- and RTSGAN-M, but there are errors generated. I would like to ask what changes need to be made if the modification is feasible?

enthusiasmYuan avatar Nov 01 '22 02:11 enthusiasmYuan

It should work. You can just remove the missing data part. I use seq_len to control the length of time series. During the decoding we first get the generated synthetic features which contains one dimension as the seq_len, then you can use this value to decide the length of generated time series.

acphile avatar Nov 02 '22 04:11 acphile

Sorry to bother you again. In the folder stock_energy, I tried to modify the synthesize function in aegan.py and use the maximum value max_len in seq_len as an argument to generate_dynamics for data generation. I found that the data generation result is not the same as I expected, the data shape within each batch is the same, while the shape between batches is different, however the batch_size is given by ourselves, unless the batch_size is set to 1, it does not achieve our expected result of generating data with different time series length. Is there a way to solve this problem?

enthusiasmYuan avatar Nov 11 '22 02:11 enthusiasmYuan

Yes. Note that for stock_energy part, it does not include static features which would contain the intended seq_len. By looking to general part you can find that we first generate static features which has a generated seq_len and use it to get the intended part of dynamic features. So what you can do is adding 1d static features denoting seq_len in stock_energy code (If you do not have any other static features for a time series.

acphile avatar Nov 11 '22 03:11 acphile

Yes, I have written seq_len in sta, and after I keep debug, I found that in the synthesize function in stock_energy/aegan.py the output of this code blocks

statics = self.ae.decoder.generate_statics(hidden),
df_sta = self.static_processor.inverse_transform(statics.cpu().numpy())

the generated seq_len are all the same length, the data here comes from the same batch, regardless of the size of the batch_size, its seq_len len is always the same regardless of the size of the batch_size. Therefore,The value of cur_len is as follows: image

Am I forgetting other steps? Or would it be more appropriateto use the general part of the code to modify it to generate a variable-length sequence?

JimmyZhan1213 avatar Nov 13 '22 07:11 JimmyZhan1213

Does your training data contains different seq_len?

acphile avatar Nov 14 '22 19:11 acphile

Sorry to bother you again. I have solved the generation problem of variable length time series, and now I want to evaluate my data generation effect according to the metrics in TimeGAN, but dimension errors always appear during the evaluation process (arrays of different shapes cannot be concatenated, and the error information is shown in the figure) Have you encountered this problem in the evaluation process? How did it work out in the end? image

enthusiasmYuan avatar Mar 09 '23 12:03 enthusiasmYuan

I think the TimeGAN repository only allows evaluations on fixed-length time series. You may need to modify their codes.

acphile avatar Mar 19 '23 05:03 acphile