modern-data-warehouse-dataops icon indicating copy to clipboard operation
modern-data-warehouse-dataops copied to clipboard

feat: introduce nyc weather features to feature-engineering on fabric single-tech sample

Open thurstonchen opened this issue 1 year ago • 1 comments

Type of PR

  • Documentation changes
  • Code changes

Purpose

Introduce NYC weather dataset.

  • to make our model training scenario a bit closer to the practical use case :- )
  • switch to LightGBMRegressor model to have higher model performance metrics.
  • now we create two feature sets, nyctaxi and nycweather, the latter could be reused by other model training requirement, which is a good showcase on why we need feature store.

Besides introducing NYC weather data of year 2022, we also add Jan. 2023 taxi trip and weather data to do the batch inferencing. Hi @promisinganuj, if you agree this PR, we'll share those new data files to you via Teams, thanks! :- )

Does this introduce a breaking change? If yes, details on what can break

NO

Author pre-publish checklist

  • [ ] Added test to prove my fix is effective or new feature works
  • [ ] No PII in logs
  • [x] Made corresponding changes to the documentation

Validation steps

  • Put new NYC weather files to the public storage account.
  • Run the Fabric data pipeline again.
  • Run model training and inferencing notebooks, now the latter notebook also requires attaching the same lakehouse used by other notebooks.

Issues Closed or Referenced

N/A

thurstonchen avatar Dec 08 '23 08:12 thurstonchen