dl-4-tsc data format of UCR2018

data format of UCR2018

Open guoxishu opened this issue 4 years ago • 8 comments

In utils.py, there is "pd.read_csv(..._TRAIN.tsv)",but there is now only data of ts format provided on the official website. Then there is obvious error for "y_train = df_train.values[:,0]" if data of ts format is used. Can you add some comment about the data shape? I'm really quite confused.

May 11 '20 04:05 guoxishu

I am not quite sure which format is now available, I will get back to you once I re-check the UCR archive.

May 11 '20 11:05 hfawaz

Hi Sir, I have the same question here. Do you have any updates about the data format? thanks!

Jun 03 '20 20:06 oceanfly

more specifically, I see the errors are:

python3 main.py TSC Coffee fcn _itr_8 Method: TSC Coffee fcn _itr_8 Traceback (most recent call last): File "main.py", line 150, in datasets_dict = read_dataset(root_dir, archive_name, dataset_name) File "/Users/taosun/Documents/GitHub/dl-4-tsc/utils/utils.py", line 105, in read_dataset x_train, y_train = readucr(file_name + '_TRAIN') File "/Users/taosun/Documents/GitHub/dl-4-tsc/utils/utils.py", line 33, in readucr data = np.loadtxt(filename, delimiter=',') File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1146, in loadtxt for x in read_data(_loadtxt_chunksize): File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1074, in read_data items = [conv(val) for (conv, val) in zip(converters, vals)] File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1074, in items = [conv(val) for (conv, val) in zip(converters, vals)] File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 781, in floatconv return float(x) ValueError: could not convert string to float: '@problemName Coffee'

Jun 03 '20 20:06 oceanfly

i had the same problem and solved it by using Coffee_TRAIN.txt and Coffee_TEST.txt instead of the once with .ts format. Then if you go to line 33 of utils.py u can see [data = np.loadtxt(filename, delimiter=' ,')]. Here you just need to swap the "," with a double spacebar, since the txt file has a different separator for parseing. Now i have different error but at least you can solve that one XD

Jun 23 '20 13:06 L-Medici

I think the problem is that the UCR dataset was updated in 2018, which changed the formatting (and added new datasets). I found the old dataset through this link, which appears to work. Hope this helps anyone running into this!

Oct 28 '20 06:10 andrew128

I’m glad to here from you. Thanks a lot!

Best wishes, Dan

On 10/28/2020 14:52, Andrew wrote:

I think the problem is that the UCR dataset was updated in 2018, which changed the formatting. I tried out the old dataset, which appears to work. Here is the link to download the old dataset (the password is "attempttoclassify"). Hope this helps anyone running into this issue!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Oct 28 '20 06:10 guoxishu

I managed to run the baselines on UCR dataset in the data formt arff with the following modifications:

Install liac-arff: pip install liac-arff ;
Add import arff in the header of utils.py ;
Add the implementation of this function on reading arff data in utils.py:

def load_data(datapath):
    """ Load .arff dataset on univariate time series classification """
    trainfile = datapath.split('/')[-2] + '_TRAIN.arff'
    testfile = datapath.split('/')[-2] + '_TEST.arff'

    train = arff.load(open(os.path.join(datapath, trainfile), 'r'))['data']
    test = arff.load(open(os.path.join(datapath, testfile), 'r'))['data']

    # Post-processing
    x_train, y_train = [], []
    for row in train:
        x_train.append(row[:-1])
        y_train.append(row[-1])
    x_train = np.vstack(x_train)
    enc = LabelEncoder()
    y_train = enc.fit_transform(y_train)

    x_test, y_test = [], []
    for row in test:
        x_test.append(row[:-1])
        y_test.append(row[-1])
    x_test = np.vstack(x_test)
    y_test = enc.transform(y_test)
    
    return x_train, y_train, x_test, y_test

Replace the code snippt from Line 76-90 of utils.py with x_train, y_train, x_test, y_test = load_data(root_dir_dataset)

Hope this could be helpful for someone else ;-)

Dec 28 '20 09:12 xuyxu

hello, I have found the data in .tsv format. This is the website https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ Hope this can help you~ Besides, I also have tried to change the code to run on the data in .arff format by using "arff.loadarff" and I succeeded. But running on .txt format failed. My device type is RTX 3060, Cuda 11.5 CuDNN 8.4, in which case I met the problem "Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed ab ove. [[node model/conv1d/conv1d (defined at C:\Users\11642\Desktop\科研\第四周\dl-4-tsc-master_origin\classifiers\fcn.py:73) ]] [Op:__inference_train_function_1608]" Don't worry! This is not the problem of version mismatch, but Insufficient graphics memory. What you should do is add the code below at the head of main.py. import tensorflow as tf physical_devices = tf.config.list_physical_devices('GPU') for dev in physical_devices: # 如果使用多块GPU时 tf.config.experimental.set_memory_growth(dev, True) This can limit the usage of you GPU.

May 09 '22 04:05 YHY-10

dl-4-tsc dl-4-tsc copied to clipboard

data format of UCR2018

dl-4-tsc
dl-4-tsc copied to clipboard