data-science
data-science copied to clipboard
CoP: Data Science: Active and Inactive Businesses of LA County
Prerequisite(s)
If you would like to work on this issue, please add a comment below and include the following information:
- Your name
- How many hours you can commit to working on this in the next week (minimum of 2)
- Commit to providing an update with a comment before the next community of practice meeting
For example:
- John Doe
- I can commit to working on this issue 3 hours in the following week.
- Yes, I will provide an update on my progress with a comment below.
Once you have done this, please add yourself to the “Assignees” section on the right and update the issue weekly to document your progress.
Overview
We want to create a usable dataset of active and inactive businesses to perform various time series analyses (i.e. visualizing business closures during the covid pandemic).
Action Items
Phase 1
- [ ] Find available data sources and add to Resources section
- [ ] Create data dictionary (EDA task)
- [ ] Create issues required to fulfill project requirements, including exploratory data analysis, required tasks, and deliverables
- [ ] Perform data cleaning (EDA task)
- [ ] Understand and outline data context
- [ ] Write one-sheet (see Resources below)
- [ ] Define stakeholder
- [ ] Summarize project, including value add
- [ ] Define project 6 month roadmap
- [ ] Detail history (if any)
Resources/Instructions
Data source for business listings in LA County.
- Prisha Puri
- I can commit to working on this issue for 5 hours this week.
- Yes, I will provide an update on my progress with a comment below.
Ting Ai I can commit to working on this issue 4 hours in the following week. Yes, I will provide an update on my progress with a comment below.
My Progress Updates
- Utilized the dataset from the Office of Finance (link above)
- Worked on data cleaning
- Used Google Colab for code development
My Progress Updates for This Week:
- Acquired information regarding time series analysis
- Changed the way the dataset was retrieved in Google Colab
- Worked on creating another data frame for time series analysis
Note: I will resume working on this issue in December.
My Progress Updates for last Week:
- Cleaned data
- Did EDA
- Visualized the number of business start and closure by time
My progress updates for this week:
- learned time series analysis
- prepared data for time series analysis
My Progress Updates for this Week:
- Replaced null values in one of the columns
- Created a dictionary that will contain the number of active businesses in each year
- Working on data cleaning and obtaining the number of active businesses in each year
My progress updates for this week: building SARIMA model (did Augmented Dickey - Fuller Test, Removed the trend to achieve stationary, did ACF and PACF)
My Progress Updates for this Week:
- Dropped rows with one or more null values
- Noticed that the number of rows in my pandas DataFrame significantly reduced after data cleaning
- Created another pandas DataFrame that displays the number of active businesses per year between the years 2000 and 2021
My Progress Updates for this Week:
- Worked on data visualization using the Plotly library
- Performed data analysis and data cleaning
My Progress Updates for this Week:
- Looked at another individual’s work for this project
- Learned about the Python library, GeoPandas
@prishapuri @xingstar97 @SachinChodavarapu Is this issue still being worked on? There have been no updates since April.
Sachin Chodavarapu I can commit to working on this issue for 5 hours this week. Yes, I will provide an update on my progress via comment
@prishapuri @xingstar97 @SachinChodavarapu Is this issue still being worked on? There have been no updates since April.
@prishapuri @xingstar97 @SachinChodavarapu Is this issue still being worked on? There have been no updates since April.
Hey Abe, I've been working on a different issue that should be completed this week. I'll be able to start working on this project on Thursday.
Max Kasbar I can commit at least 2 hours per week on this issue I'm also willing to provide updates via the comment section
@akhaleghi Hi Abe! I am currently working on this issue. I will send a message to you on Slack with more information.
As of this comment, I'm working on a data dictionary. I'll post it either here, somewhere else, or at least a link to it on Slack.
As of this comment, I'm working on a data dictionary. I'll post it either here, somewhere else, or at least a link to it on Slack.
Here is the link to the data dictionary. If anything is wrong with it, please let me know. https://docs.google.com/spreadsheets/d/1tL11Ce6x_jYo3aitxbalo_NcoHaZ6o1CxQUHeHTUIDM/edit?usp=sharing
- Utilized the old dataset (from department of finance) for data cleaning, dropped NAICS column for accuracy.
- Performed EDA and figured out business survival analysis would give better data insights from the given data set.
- Created a new column 'duration' which helps in analyzing patterns in business closures and identify key factors that contribute to business success or failure.
A copy of the data dictionary for those unable to access Google Spreadsheets:
An additional resource to add to resources. The following is a listing of business that register with Office of Finance during that month: https://finance.lacity.gov/new-monthly-business-listings
- Performed EDA by filtering the data on multiple levels namely - city, active, inactive, inactive during covid (2020-2022)
- Created heatmap based on real Map to see if actual locations have any relevance
- Figured out inactive businesses along the different categories based on NAICS column. notebook for reference: https://colab.research.google.com/drive/1KqMzxy96pF4Jh8IqcD9zk5f28BkBZzLa?usp=sharing
@rahul897 @prishapuri @max1million101 @xingstar97 @SachinChodavarapu @SathvikLingabathula There have been no updates on this issue since June. Are any of you still actively working on this?
@rahul897 @prishapuri @max1million101 @xingstar97 @SachinChodavarapu @SathvikLingabathula There have been no updates on this issue since June. Are any of you still actively working on this?
@akhaleghi Hi Abe! I sent a message to you on Slack.