modern-data-warehouse-dataops
modern-data-warehouse-dataops copied to clipboard
feat: introduce nyc weather features to feature-engineering on fabric single-tech sample
Type of PR
- Documentation changes
- Code changes
Purpose
Introduce NYC weather dataset.
- to make our model training scenario a bit closer to the practical use case :- )
- switch to LightGBMRegressor model to have higher model performance metrics.
- now we create two feature sets, nyctaxi and nycweather, the latter could be reused by other model training requirement, which is a good showcase on why we need feature store.
Besides introducing NYC weather data of year 2022, we also add Jan. 2023 taxi trip and weather data to do the batch inferencing. Hi @promisinganuj, if you agree this PR, we'll share those new data files to you via Teams, thanks! :- )
Does this introduce a breaking change? If yes, details on what can break
NO
Author pre-publish checklist
- [ ] Added test to prove my fix is effective or new feature works
- [ ] No PII in logs
- [x] Made corresponding changes to the documentation
Validation steps
- Put new NYC weather files to the public storage account.
- Run the Fabric data pipeline again.
- Run model training and inferencing notebooks, now the latter notebook also requires attaching the same lakehouse used by other notebooks.
Issues Closed or Referenced
N/A