data-science icon indicating copy to clipboard operation
data-science copied to clipboard

CoP: Data Science: Active and Inactive Businesses of LA County

Open akhaleghi opened this issue 1 year ago • 23 comments

Prerequisite(s)

If you would like to work on this issue, please add a comment below and include the following information:

  • Your name
  • How many hours you can commit to working on this in the next week (minimum of 2)
  • Commit to providing an update with a comment before the next community of practice meeting

For example:

  • John Doe
  • I can commit to working on this issue 3 hours in the following week.
  • Yes, I will provide an update on my progress with a comment below.

Once you have done this, please add yourself to the “Assignees” section on the right and update the issue weekly to document your progress.

Overview

We want to create a usable dataset of active and inactive businesses to perform various time series analyses (i.e. visualizing business closures during the covid pandemic).

Action Items

Phase 1

  • [ ] Find available data sources and add to Resources section
  • [ ] Create data dictionary (EDA task)
  • [ ] Create issues required to fulfill project requirements, including exploratory data analysis, required tasks, and deliverables
    • [ ] Perform data cleaning (EDA task)
    • [ ] Understand and outline data context
  • [ ] Write one-sheet (see Resources below)
    • [ ] Define stakeholder
    • [ ] Summarize project, including value add
    • [ ] Define project 6 month roadmap
    • [ ] Detail history (if any)

Resources/Instructions

Data source for business listings in LA County.

akhaleghi avatar Aug 30 '23 18:08 akhaleghi

  • Prisha Puri
  • I can commit to working on this issue for 5 hours this week.
  • Yes, I will provide an update on my progress with a comment below.

prishapuri avatar Oct 25 '23 04:10 prishapuri

Ting Ai I can commit to working on this issue 4 hours in the following week. Yes, I will provide an update on my progress with a comment below.

xingstar97 avatar Oct 27 '23 01:10 xingstar97

My Progress Updates

  • Utilized the dataset from the Office of Finance (link above)
  • Worked on data cleaning
  • Used Google Colab for code development

prishapuri avatar Oct 29 '23 06:10 prishapuri

My Progress Updates for This Week:

  • Acquired information regarding time series analysis
  • Changed the way the dataset was retrieved in Google Colab
  • Worked on creating another data frame for time series analysis

Note: I will resume working on this issue in December.

prishapuri avatar Nov 03 '23 20:11 prishapuri

My Progress Updates for last Week:

  • Cleaned data
  • Did EDA
  • Visualized the number of business start and closure by time

xingstar97 avatar Nov 13 '23 05:11 xingstar97

My progress updates for this week:

  • learned time series analysis
  • prepared data for time series analysis

xingstar97 avatar Nov 13 '23 05:11 xingstar97

My Progress Updates for this Week:

  • Replaced null values in one of the columns
  • Created a dictionary that will contain the number of active businesses in each year
  • Working on data cleaning and obtaining the number of active businesses in each year

prishapuri avatar Dec 17 '23 07:12 prishapuri

My progress updates for this week: building SARIMA model (did Augmented Dickey - Fuller Test, Removed the trend to achieve stationary, did ACF and PACF)

xingstar97 avatar Jan 08 '24 19:01 xingstar97

My Progress Updates for this Week:

  • Dropped rows with one or more null values
  • Noticed that the number of rows in my pandas DataFrame significantly reduced after data cleaning
  • Created another pandas DataFrame that displays the number of active businesses per year between the years 2000 and 2021

prishapuri avatar Jan 15 '24 07:01 prishapuri

My Progress Updates for this Week:

  • Worked on data visualization using the Plotly library
  • Performed data analysis and data cleaning

prishapuri avatar Feb 06 '24 06:02 prishapuri

My Progress Updates for this Week:

  • Looked at another individual’s work for this project
  • Learned about the Python library, GeoPandas

prishapuri avatar Apr 21 '24 06:04 prishapuri

@prishapuri @xingstar97 @SachinChodavarapu Is this issue still being worked on? There have been no updates since April.

akhaleghi avatar Jun 11 '24 02:06 akhaleghi

Sachin Chodavarapu I can commit to working on this issue for 5 hours this week. Yes, I will provide an update on my progress via comment

SachinChodavarapu avatar Jun 11 '24 20:06 SachinChodavarapu

@prishapuri @xingstar97 @SachinChodavarapu Is this issue still being worked on? There have been no updates since April.

@prishapuri @xingstar97 @SachinChodavarapu Is this issue still being worked on? There have been no updates since April.

Hey Abe, I've been working on a different issue that should be completed this week. I'll be able to start working on this project on Thursday.

SachinChodavarapu avatar Jun 11 '24 20:06 SachinChodavarapu

Max Kasbar I can commit at least 2 hours per week on this issue I'm also willing to provide updates via the comment section

max1million101 avatar Jun 16 '24 17:06 max1million101

@akhaleghi Hi Abe! I am currently working on this issue. I will send a message to you on Slack with more information.

prishapuri avatar Jun 17 '24 05:06 prishapuri

As of this comment, I'm working on a data dictionary. I'll post it either here, somewhere else, or at least a link to it on Slack.

max1million101 avatar Jun 17 '24 19:06 max1million101

As of this comment, I'm working on a data dictionary. I'll post it either here, somewhere else, or at least a link to it on Slack.

Here is the link to the data dictionary. If anything is wrong with it, please let me know. https://docs.google.com/spreadsheets/d/1tL11Ce6x_jYo3aitxbalo_NcoHaZ6o1CxQUHeHTUIDM/edit?usp=sharing

max1million101 avatar Jun 17 '24 21:06 max1million101

  • Utilized the old dataset (from department of finance) for data cleaning, dropped NAICS column for accuracy.
  • Performed EDA and figured out business survival analysis would give better data insights from the given data set.
  • Created a new column 'duration' which helps in analyzing patterns in business closures and identify key factors that contribute to business success or failure.

SachinChodavarapu avatar Jun 18 '24 01:06 SachinChodavarapu

A copy of the data dictionary for those unable to access Google Spreadsheets:

BusinessDataDictionary.xlsx

max1million101 avatar Jun 18 '24 02:06 max1million101

An additional resource to add to resources. The following is a listing of business that register with Office of Finance during that month: https://finance.lacity.gov/new-monthly-business-listings

max1million101 avatar Jun 24 '24 14:06 max1million101

  • Performed EDA by filtering the data on multiple levels namely - city, active, inactive, inactive during covid (2020-2022)
  • Created heatmap based on real Map to see if actual locations have any relevance
  • Figured out inactive businesses along the different categories based on NAICS column. notebook for reference: https://colab.research.google.com/drive/1KqMzxy96pF4Jh8IqcD9zk5f28BkBZzLa?usp=sharing

rahul897 avatar Jun 24 '24 23:06 rahul897

@rahul897 @prishapuri @max1million101 @xingstar97 @SachinChodavarapu @SathvikLingabathula There have been no updates on this issue since June. Are any of you still actively working on this?

akhaleghi avatar Sep 23 '24 23:09 akhaleghi

@rahul897 @prishapuri @max1million101 @xingstar97 @SachinChodavarapu @SathvikLingabathula There have been no updates on this issue since June. Are any of you still actively working on this?

@akhaleghi Hi Abe! I sent a message to you on Slack.

prishapuri avatar Oct 06 '24 05:10 prishapuri