sds2019
sds2019 copied to clipboard
Ex. 6.1.5
Hi,
I tried to create the code for 6.1.5. I tested it line by line, and it doesn't seem to work with the loop. I am trying to get the country codes and insert them into the appropriate column, "Country_Codes". What should I change?
Note: This is not the full code I intend to write, but it's what I have so far
def weather(year):
import pandas as pd
import re
url="https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/"+year+".csv.gz"
data=pd.read_csv(url,header=-1)
data=data.drop(data.columns[4:],axis=1)
COLS=["Station_Identifier","Observation_Time","Observation_Type","Observation_Value"]
data.columns=COLS
data["Observation_Value"]=data["Observation_Value"]/10
data.round(decimals=2)
data2=data.loc[(data["Observation_Type"]=="TMAX")]
data2["TMAX_F"]=data2["Observation_Value"]*1.8+32
data2["Observation_Time"]=data2["Observation_Time"].astype(str)
data2.Observation_Time=pd.to_datetime(data2["Observation_Time"]) #.loc[row_indexer,col_indexer] = value instead
data2["Month"]=data2["Observation_Time"].dt.month
data2["Country_Code"]=""
for i,row in data2.iterrows():
data2.loc[i,"Country_Code"]=" ".join(re.findall("[a-zA-Z]+", data2.loc[i,"Station_Identifier"]))
data2.set_index("Observation_Time")
print(data2)
weather("1905")`
```
Thanks,
Andreas
Line number 11 should be:
data2 = data.loc[(data["Observation_Type"]=="TMAX")].copy()
Using just data2=data.loc[(data["Observation_Type"]=="TMAX")]
will result in a view of data
being assigned to data2
. You don't want that as that can give you problems later, when modifying values to data2
. What you want is a copy of the dataset to be assigned to data2
.
I think Sebastian found your mistake. However here are some general comments:
These two lines don't belong in a function body (Imports go at the top of the file)
import pandas as pd
import re
You could replace
url="https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/"+year+".csv.gz"
with
url="https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/{}.csv.gz".format(year)
This
data2["Country_Code"]=""
for i,row in data2.iterrows():
data2.loc[i,"Country_Code"]=" ".join(re.findall("[a-zA-Z]+", data2.loc[i,"Station_Identifier"]))
is overly complicated. It could be done by
data2['Country_Code'] = data2['Station_Identifier'].str.extract("([a-zA-Z]+)")
or
data2['Country_Code'] = data2['Station_Identifier'].apply(lambda x: re.findall('[a-zA-Z]+', x)[0] )