arcgis-python-api
arcgis-python-api copied to clipboard
housing_valuation_using_automl
Checklist
Please go through each entry in the below checklist and mark an 'X' if that condition has been met. Every entry should be marked with an 'X' to be get the Pull Request approved.
- [x] All
import
s are in the first cell?- [ ] First block of imports are standard libraries
- [ ] Second block are 3rd party libraries
- [ ] Third block are all
arcgis
imports? Note that in some cases, for samples, it is a good idea to keep the imports next to where they are used, particularly for uncommonly used features that we want to highlight.
- [x] All
GIS
object instantiations are one of the following?-
gis = GIS()
-
gis = GIS('home')
orgis = GIS('pro')
-
gis = GIS(profile="your_online_portal")
-
gis = GIS('https://pythonapi.playground.esri.com/portal')
-
gis = GIS(profile="your_enterprise_portal")
-
- [ ] If this notebook requires setup or teardown, did you add the appropriate code to
./misc/setup.py
and/or./misc/teardown.py
? - [ ] If this notebook references any portal items that need to be staged on AGOL/Python API playground, did you coordinate with a Python API team member to stage the item the correct way with the
api_data_owner
user? - [ ] If the notebook requires working with local data (such as CSV, FGDB, SHP, Raster files), upload the files as items to the Geosaurus Online Org using
api_data_owner
account and change the notebook to first download and unpack the files. - [ ] Code simplified & split out across multiple cells, useful comments?
- [ ] Consistent voice/tense/narrative style? Thoroughly checked for typos?
- [x] All images used like
<img src="base64str_here">
instead of<img src="https://some.url">
? All map widgets contain a static image preview? (Callmapview_inst.take_screenshot()
to do so) - [ ] All file paths are constructed in an OS-agnostic fashion with
os.path.join()
? (Instead ofr"\foo\bar"
,os.path.join(os.path.sep, "foo", "bar")
, etc.) - [ ] IF YOU WANT THIS SAMPLE TO BE DISPLAYED ON THE DEVELOPERS.ARCGIS.COM WEBSITE, ping @ mohi9282 so he can add it to the list for the next deploy
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:21Z ----------------------------------------------------------------
remove all underscores from the heading, and remove caps other than titles
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:22Z ----------------------------------------------------------------
" this is too long for a title, you can explain this in the cell, here you can mention "data geoenrichment"
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:22Z ----------------------------------------------------------------
title should match with the content
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:23Z ----------------------------------------------------------------
this title is not clear, think of a more meaningful & self explanatory title
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:24Z ----------------------------------------------------------------
can you change the visualization, its all jumbled up, try to resize the symbols or use circles to clean it up
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:25Z ----------------------------------------------------------------
we do not need to show so many rows, only 2 to 5 rows is sufficient, less rows is easier to comprehend
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:26Z ----------------------------------------------------------------
"prices.Apart " space missing. Also you can add the data definition title here instead, so it will be more clear
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:27Z ----------------------------------------------------------------
same as mentioned above, only 2 to 5 rows is ok, and this part is redundant "with 5 heads & 5 tails value after dropping the columns"
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:27Z ----------------------------------------------------------------
mention some more variables that you are using in the final model apart from sqft_living, which has strong relationship
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:28Z ----------------------------------------------------------------
correct for missing spaces in the paragraph.
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:29Z ----------------------------------------------------------------
"King County housing dataset" part in the title is redundant, its already mentioned
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:30Z ----------------------------------------------------------------
remove "Explain"(Default)" from the above line, you are already using it. Check grammar, and reconstruct the line
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:31Z ----------------------------------------------------------------
add a rejoinder/warning, that adding each geoenrichment variable will cost extra credits for the user, and they have to pay the necessary charges
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:32Z ----------------------------------------------------------------
this cell is failing due to existing output_name when run second time, adjust the output_name such that it changes automatically every time the cell is run, instead of manually changing it
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:32Z ----------------------------------------------------------------
visualize 2 to 5 rows
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:33Z ----------------------------------------------------------------
visualize 2 to 5 rows
View / edit / reply to this conversation on ReviewNB
moonlanderr commented on 2022-08-04T10:40:34Z ----------------------------------------------------------------
Conclusion needs, more explanation and actual comparison in terms of metrics of before and after enrichments, & need to be summarized, to support the findings.
@sumanttyagi changes requested by Supratim
@sumanttyagi, checked the notebook, mostly its ok, just remove the preprocessors in both the prepare data places, and some minor changes , the result should be same , I have chekd. In the geo enrichment cell , clean it up a bit, add some spaces in between the commenting lines, and add a missing space in the word- outputfilename
@sumanttyagi
- please change the file name to use all lowercase letters and replace the hyphen with an underscore
View / edit / reply to this conversation on ReviewNB
BP-Ent commented on 2022-12-27T22:35:52Z ----------------------------------------------------------------
Housing is considered a basic need for all human beings. Predicting housing prices can help an individual get the most out of their building, which can further help in affordability. Traditionally, appraising the value of a property entails visiting the site, valuing the features of the building, and quoting a price on the basis of prior sales, future development of the town, market knowledge, utilities, and various other things that contribute to an asset's value.
In machine learning, housing prediction is considered a regression task. Many machine learning approaches have been used to get better results. However, as more methods continue to be developed, it has become difficult for an individual to keep track of which method to use. Fortunately, AutoML
helps by providing the best fit model among all available methods. In this notebook, we will use AutoML
and enrich datasets
to get better and quicker results, without needing to go through a series of ML algorithms.
The primary data we will be using is the King County housing prices dataset. Other data includes 2019 and 2017 census data using Esri's enrichment services.
**Note: I am not able to access the King County link. Does it still work?**
View / edit / reply to this conversation on ReviewNB
BP-Ent commented on 2022-12-27T22:35:52Z ----------------------------------------------------------------
First, we will create a feature layer hosted on ArcGIS Online. The dataset can either be downloaded here or can be accessed using the item number b9e59c0473514abe8da6f395c628e4af
.
View / edit / reply to this conversation on ReviewNB
BP-Ent commented on 2022-12-27T22:35:53Z ----------------------------------------------------------------
The dataframe table has 20 fields describing the homes and their sale prices, along with ObjectID
and Shape
fields.
View / edit / reply to this conversation on ReviewNB
BP-Ent commented on 2022-12-27T22:35:54Z ----------------------------------------------------------------
The scatterplot indicates a positive relationship between price
and sqft_living
, so the larger sqft_living
is, the more expensive the property will be.
View / edit / reply to this conversation on ReviewNB
BP-Ent commented on 2022-12-27T22:35:55Z ----------------------------------------------------------------
The graph above indicates that the most common home has 3 bedrooms.
View / edit / reply to this conversation on ReviewNB
BP-Ent commented on 2022-12-27T22:35:56Z ----------------------------------------------------------------
The graph above has a Y-axis representing the number of bins and an X-axis representing the ranges of those bins. For example, we can see that 9,332 homes, approximately 43%, have a sqft_living
value in the range of 1,173 - 2,056.66 sqft and that the histogram is skewed right.
** Am I interpreting this correctly? Is there a way to include some labeling on the histogram?**
View / edit / reply to this conversation on ReviewNB
BP-Ent commented on 2022-12-27T22:35:56Z ----------------------------------------------------------------
The Pearson correlation coefficient (Pearson's r) quantifies the strength of the linear relationship between variables, or how much influence one variable has on another. An absolute value of Pearson's r close to one indicates a strong positive linear relationship, whereas values close to zero indicate a weak linear relationship. For example, from the graph above, we can see the Pearson's r value for price
and sqft_living
is 0.70. This value indicates that the two variables have a strong positive relationship.
View / edit / reply to this conversation on ReviewNB
BP-Ent commented on 2022-12-27T22:35:57Z ----------------------------------------------------------------
In the boxplot above, we can see that there is an increasing trend between price and the number of bathrooms. The median price for properties with 0 - 3.5 bathrooms lies below 1,000,000, and in most cases, the median price is below 2,000,000.
When looking at properties with 5.5 - 8 bathrooms, we can see that there are larger interquartile ranges, indicating that there is a larger variation in value for these properties. For most properties the value per number of bathrooms is balanced or have a positive skew.
Finally, the large number of outliers that can be seen indicate that there is some variability in price that can likely be attributed to the other features and amenities of a property.
View / edit / reply to this conversation on ReviewNB
BP-Ent commented on 2022-12-27T22:35:58Z ----------------------------------------------------------------
We will use the following features in predicting home prices. Note that Zipcode
is considered a categorical variable.