sktime
sktime copied to clipboard
[ENH] Add `@targetlabel` identifier for `.ts` files.
Reference Issues/PRs
Fixes #3412.
What does this implement/fix? Explain your changes.
Add an elif
block to read for @targetlabel
in two places - _read_header()
and load_from_tsfile_to_dataframe()
. The implementation is very similar to @classlabel
but it doesn't accept additional values (since regression data has no class labels).
Does your contribution introduce a new dependency? If yes, which one?
No.
What should a reviewer concentrate their feedback on?
@TonyBagnall, I found a minor typo in one of the keys of meta_data
of _read_header()
. It probably should be has_class_label
instead of class_label
. Since it was written by you, please check if I misunderstood.
- [x] Remove a possible silent bug due to a typo.
- [x] Add
@targetlabel
as a valid identifier. - [ ] Add a test for regression data. For this, I want to add Covid19 Death Rate Dataset from tseregression. I need confirmation if the licence associated with it doesn't clash with our usage. This dataset would go inside
datasets/data/
. If this is not possible, I could also create a dummy regression dataset for testing purposes, similar toUnitTest
.
Any other comments?
There is a major issue with our current implementation. When timestamps are present, code requires that we have class labels present(a list of possible labels like [0, 1, 2]
). This is contrary to regression data. Since this might require major changes throughout, we could add it into the workflow as mentioned here. This PR only solves those cases where there is regression data without timestamps.
PR checklist
For all contributions
- [ ] I've added unit tests and made sure they pass locally.
- [x] The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.
this looks good and is funnily enough exactly what I need for another project. I was looking to put a TSR dataset in, perhaps this one? https://zenodo.org/record/3902690#.Yyxln3bMIQ8 I think its smallest in the Monash archive http://tseregression.org/ there is a guide to adding a new dataset in the developers guide https://github.com/alan-turing-institute/sktime/blob/main/docs/source/developer_guide/add_dataset.rst
Yes, we both are referring to the same dataset. I wanted to keep the size small as well, so I chose the covid dataset.
This PR only solves those cases where there is regression data without timestamps.
I could extend this PR to correct this via an additional variable. Let me explain the issue:
- Issue: Current loader does not work with regression data that have timestamps.
- Reason: When timestamps are present, the code does an additional check to see if a label is present in the list of labels (they are provided with
@classlabel
for classification data). There are no such labels for regression data, resulting in a bug. - Solution: The easiest way to resolve this is to add another boolean variable to skip this step if the data is regressional. A more complete solution might include reworking the entire control flow.
I could add another variable to resolve it since there doesn't seem to be any concrete need to choose the latter solution. What do you think?
thanks, I think the best way forward is to get this PR in, then raise an issue about loading regression problems with time stamps. You can add the step you mention on this PR as well if you want.
will just wait until the new release before putting this in, there is a feature freeze in place
dataset license is fine, is creative commons