tsai icon indicating copy to clipboard operation
tsai copied to clipboard

The problem with converting dataframe to ts?

Open papaloveray opened this issue 3 years ago • 1 comments

I have two questions. My dataframe like below: sample,date,feature1,feature2,...target

Due to some reason,some sample lost some days records. e.g #1 sample have 1000 dates rows, but #2 sample only have 500 dates rows. However, I have found current version of the tsai can't handle this kinda situation. In my opinion this situation is quit usual, is it possible to upgrade the tsai to support? The second question is i really recommend you guys pay more attention to the quality of the document&tutorial, the more examples the better, especially for the data preparation. To be frank with you, i have read more than 2 days for the section of the data preparation, but still confused. Since I have 1sample, 5features, 2257rows, then used df2xy function to get the X.shape=(1,5,2257), y.shape(2257,). But , how to get the X_train, X_test,y_train, y_test? I mean lots of people are familiar with the pandas and sklearn, the transition is quit important for us. thanks.

papaloveray avatar Jul 24 '22 08:07 papaloveray

I'm with you...

I'm no ML specialist or engineer and just wanted to experiment with timeseriesAI....

The notebooks on data preparation are extremely cumbersome and confusing. One giant notebook, for instance, shows multiple ways of doing that preparation, but doesn't really say WHY or HOW the result came out the way they did.

Adding on top of the confusion, some functions print out 9 graphs of the same preparation and nobody explains why there are 9 or what they really mean. Granted, they show us data, but at what are we looking, and why are there NINE of them?

I just wanted to convert a csv, json, whatever timeseries to a readable ML format, train it and see if it can forecast one or two series, but spending hours on complicated notebooks still got me nowhere.

As I said: More proficient people in PyTorch or ML in general will probably laugh at posts like these, however, if TimeseriesAI is meant for the general public and wants to impress all kinds of groups, it's heading the wrong way. It doesn't lack documentation per-se, it lacks clear and concise explanation to the documentation provided, especially for beginners in this field.

Just my few cents.

Fusseldieb avatar Sep 13 '22 18:09 Fusseldieb

Thanks for your feedback @Fusseldieb and @papaloveray. I understand your frustration. Data preparation is always the most complex area in any project. But we'll try to improve the library's documentation. Having said that, I've found endless ways in which data may be available. One or multiple files, different formats, missing data, data grouped by user or not, etc. The purpose of the data preparation section of the library is to demonstrate some of the functionality available and help you prepare data. But there's no way tsai will be able to support most use cases. As to the specific issue you mention @papaloveray on the missing dates, there's functionality to handle those situations, like add_missing_timestamps.

oguiza avatar Nov 21 '22 12:11 oguiza