arcgis-python-api icon indicating copy to clipboard operation
arcgis-python-api copied to clipboard

housing_valuation_using_automl

Open sumanttyagi opened this issue 2 years ago • 19 comments


Checklist

Please go through each entry in the below checklist and mark an 'X' if that condition has been met. Every entry should be marked with an 'X' to be get the Pull Request approved.

  • [x] All imports are in the first cell?
    • [ ] First block of imports are standard libraries
    • [ ] Second block are 3rd party libraries
    • [ ] Third block are all arcgis imports? Note that in some cases, for samples, it is a good idea to keep the imports next to where they are used, particularly for uncommonly used features that we want to highlight.
  • [x] All GIS object instantiations are one of the following?
    • gis = GIS()
    • gis = GIS('home') or gis = GIS('pro')
    • gis = GIS(profile="your_online_portal")
    • gis = GIS('https://pythonapi.playground.esri.com/portal')
    • gis = GIS(profile="your_enterprise_portal")
  • [ ] If this notebook requires setup or teardown, did you add the appropriate code to ./misc/setup.py and/or ./misc/teardown.py?
  • [ ] If this notebook references any portal items that need to be staged on AGOL/Python API playground, did you coordinate with a Python API team member to stage the item the correct way with the api_data_owner user?
  • [ ] If the notebook requires working with local data (such as CSV, FGDB, SHP, Raster files), upload the files as items to the Geosaurus Online Org using api_data_owner account and change the notebook to first download and unpack the files.
  • [ ] Code simplified & split out across multiple cells, useful comments?
  • [ ] Consistent voice/tense/narrative style? Thoroughly checked for typos?
  • [x] All images used like <img src="base64str_here"> instead of <img src="https://some.url">? All map widgets contain a static image preview? (Call mapview_inst.take_screenshot() to do so)
  • [ ] All file paths are constructed in an OS-agnostic fashion with os.path.join()? (Instead of r"\foo\bar", os.path.join(os.path.sep, "foo", "bar"), etc.)
  • [ ] IF YOU WANT THIS SAMPLE TO BE DISPLAYED ON THE DEVELOPERS.ARCGIS.COM WEBSITE, ping @ mohi9282 so he can add it to the list for the next deploy

sumanttyagi avatar Aug 03 '22 12:08 sumanttyagi

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:21Z ----------------------------------------------------------------

remove all underscores from the heading, and remove caps other than titles


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:22Z ----------------------------------------------------------------

"GeoEnriching the dataset using data enrich tool/Enriching the point feature with demographic data using geoenrichment service from Esri

" this is too long for a title, you can explain this in the cell, here you can mention "data geoenrichment"


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:22Z ----------------------------------------------------------------

title should match with the content


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:23Z ----------------------------------------------------------------

this title is not clear, think of a more meaningful & self explanatory title


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:24Z ----------------------------------------------------------------

can you change the visualization, its all jumbled up, try to resize the symbols or use circles to clean it up


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:25Z ----------------------------------------------------------------

we do not need to show so many rows, only 2 to 5 rows is sufficient, less rows is easier to comprehend


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:26Z ----------------------------------------------------------------

"prices.Apart " space missing. Also you can add the data definition title here instead, so it will be more clear


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:27Z ----------------------------------------------------------------

same as mentioned above, only 2 to 5 rows is ok, and this part is redundant "with 5 heads & 5 tails value after dropping the columns"


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:27Z ----------------------------------------------------------------

mention some more variables that you are using in the final model apart from sqft_living, which has strong relationship


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:28Z ----------------------------------------------------------------

correct for missing spaces in the paragraph.


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:29Z ----------------------------------------------------------------

"King County housing dataset" part in the title is redundant, its already mentioned


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:30Z ----------------------------------------------------------------

remove "Explain"(Default)" from the above line, you are already using it. Check grammar, and reconstruct the line


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:31Z ----------------------------------------------------------------

add a rejoinder/warning, that adding each geoenrichment variable will cost extra credits for the user, and they have to pay the necessary charges


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:32Z ----------------------------------------------------------------

this cell is failing due to existing output_name when run second time, adjust the output_name such that it changes automatically every time the cell is run, instead of manually changing it


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:32Z ----------------------------------------------------------------

visualize 2 to 5 rows


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:33Z ----------------------------------------------------------------

visualize 2 to 5 rows


View / edit / reply to this conversation on ReviewNB

moonlanderr commented on 2022-08-04T10:40:34Z ----------------------------------------------------------------

Conclusion needs, more explanation and actual comparison in terms of metrics of before and after enrichments, & need to be summarized, to support the findings.


@sumanttyagi changes requested by Supratim

priyankatuteja avatar Aug 11 '22 06:08 priyankatuteja

@sumanttyagi, checked the notebook, mostly its ok, just remove the preprocessors in both the prepare data places, and some minor changes , the result should be same , I have chekd. In the geo enrichment cell , clean it up a bit, add some spaces in between the commenting lines, and add a missing space in the word- outputfilename

moonlanderr avatar Oct 12 '22 06:10 moonlanderr

@sumanttyagi

  • please change the file name to use all lowercase letters and replace the hyphen with an underscore

jyaistMap avatar Dec 15 '22 00:12 jyaistMap

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2022-12-27T22:35:52Z ----------------------------------------------------------------

Housing is considered a basic need for all human beings. Predicting housing prices can help an individual get the most out of their building, which can further help in affordability. Traditionally, appraising the value of a property entails visiting the site, valuing the features of the building, and quoting a price on the basis of prior sales, future development of the town, market knowledge, utilities, and various other things that contribute to an asset's value.

In machine learning, housing prediction is considered a regression task. Many machine learning approaches have been used to get better results. However, as more methods continue to be developed, it has become difficult for an individual to keep track of which method to use. Fortunately, AutoML helps by providing the best fit model among all available methods. In this notebook, we will use AutoML and enrich datasets to get better and quicker results, without needing to go through a series of ML algorithms.

The primary data we will be using is the King County housing prices dataset. Other data includes 2019 and 2017 census data using Esri's enrichment services.

**Note: I am not able to access the King County link. Does it still work?**


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2022-12-27T22:35:52Z ----------------------------------------------------------------

First, we will create a feature layer hosted on ArcGIS Online. The dataset can either be downloaded here or can be accessed using the item number b9e59c0473514abe8da6f395c628e4af.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2022-12-27T22:35:53Z ----------------------------------------------------------------

The dataframe table has 20 fields describing the homes and their sale prices, along with ObjectID and Shape fields.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2022-12-27T22:35:54Z ----------------------------------------------------------------

The scatterplot indicates a positive relationship between price and sqft_living, so the larger sqft_living is, the more expensive the property will be.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2022-12-27T22:35:55Z ----------------------------------------------------------------

The graph above indicates that the most common home has 3 bedrooms.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2022-12-27T22:35:56Z ----------------------------------------------------------------

The graph above has a Y-axis representing the number of bins and an X-axis representing the ranges of those bins. For example, we can see that 9,332 homes, approximately 43%, have a sqft_living value in the range of 1,173 - 2,056.66 sqft and that the histogram is skewed right.

** Am I interpreting this correctly? Is there a way to include some labeling on the histogram?**


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2022-12-27T22:35:56Z ----------------------------------------------------------------

The Pearson correlation coefficient (Pearson's r) quantifies the strength of the linear relationship between variables, or how much influence one variable has on another. An absolute value of Pearson's r close to one indicates a strong positive linear relationship, whereas values close to zero indicate a weak linear relationship. For example, from the graph above, we can see the Pearson's r value for price and sqft_living is 0.70. This value indicates that the two variables have a strong positive relationship.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2022-12-27T22:35:57Z ----------------------------------------------------------------

In the boxplot above, we can see that there is an increasing trend between price and the number of bathrooms. The median price for properties with 0 - 3.5 bathrooms lies below 1,000,000, and in most cases, the median price is below 2,000,000.

When looking at properties with 5.5 - 8 bathrooms, we can see that there are larger interquartile ranges, indicating that there is a larger variation in value for these properties. For most properties the value per number of bathrooms is balanced or have a positive skew.

Finally, the large number of outliers that can be seen indicate that there is some variability in price that can likely be attributed to the other features and amenities of a property.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2022-12-27T22:35:58Z ----------------------------------------------------------------

We will use the following features in predicting home prices. Note that Zipcode is considered a categorical variable.