Auto ML assets
I have created links and updated system tests for Auto ML operators.
Co-authored-by: Wojciech Januszek [email protected] Co-authored-by: Lukasz Wyszomirski [email protected] Co-authored-by: Maksim Yermakou [email protected]
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.
Errors :(
Rebased to acount for Flask 2.2 errors fixed yesterday.
there is a csv file tests/system/providers/google/cloud/automl/resources/bank-marketing.csv
of 45K char , is that normal ?
Yeah 45K lines of .csv file is NOT something we want. Few options:
- what happens when you zip the file ? how big it is going to get
- Do we REALLY need as big of a file?
- We could easily place it it in our Amazon S3 bucket to download it for the test when needed, we could make it publicly available
Yeah 45K lines of .csv file is NOT something we want. Few options:
- what happens when you zip the file ? how big it is going to get
- Do we REALLY need as big of a file?
- We could easily place it it in our Amazon S3 bucket to download it for the test when needed, we could make it publicly available
This .csv is needed for training an AutoML model, in order to start the training .csv should consist more then 1000 rows. For our test I can reduce the file to 2100 rows. @potiuk what do you think about reducing the file size?
Yeah 45K lines of .csv file is NOT something we want. Few options:
- what happens when you zip the file ? how big it is going to get
- Do we REALLY need as big of a file?
- We could easily place it it in our Amazon S3 bucket to download it for the test when needed, we could make it publicly available
This .csv is needed for training an AutoML model, in order to start the training .csv should consist more then 1000 rows. For our test I can reduce the file to 2100 rows. @potiuk what do you think about reducing the file size?
@potiuk Catching attention :) I think 2100 is okayish (not the best but certainly better than 50k). Please comment if you still think it should be stored in the external storage.
Can we compress it (and dynamically decompress during test?). Just zipping it is 20K instead of 160K. This file is unlikely to ever change and it is cimpletely uninteresting to see what's in when you review the cod, so there is no particular reason to keep text file in Git.
It's not only the size that matters in this case. Keeping it plain text has this really nasty effect that it when you search something in the source code in your IDE, you will find some matching words here likely, so keeping the file uncompressed make it very prone to falling search&replace victim,
Can we compress it (and dynamically decompress during test?). Just zipping it is 20K instead of 160K. This file is unlikely to ever change and it is cimpletely uninteresting to see what's in when you review the cod, so there is no particular reason to keep text file in Git.
It's not only the size that matters in this case. Keeping it plain text has this really nasty effect that it when you search something in the source code in your IDE, you will find some matching words here likely, so keeping the file uncompressed make it very prone to falling search&replace victim,
@potiuk I have done it
Sorry for delay - been a bit busy.
No, It's not compressed - it's just bundled in .tar now not .zipped (.tar-ing single file kinda make no sense) . Stil takes 170 instead of 20K (and this PR needs rebase anyway).
conflicts need to be resolved after string normalisation
Rebased to rebuild.
Tests failing.
static check failures.
REbased - static checks fixed in main (mysql python connector release breaking mypy)
@potiuk I think that PR can be merged. I can't do that because I am not the author of PR and I don't have write access