starlib
starlib copied to clipboard
dataframe headers don't make it into the qri dataset
Given the following code, we expect the resulting Qri dataset body to have a column named firstname. Instead we see the first row as the first column name.
# CSV Download Code Sample
# This really works! Click 'Dry Run' to try it ↗
# import dependencies
load("http.star", "http") # `http` lets us talk to the internets
load("dataframe.star", "dataframe") # `dataframe` gives us powerful dataset manipulation capabilities
# with dependencies loaded, download a CSV
# this fetches a "popular baby names" dataset from the NYC Open Data Portal
csvDownloadUrl = "https://data.cityofnewyork.us/api/views/25th-nujf/rows.csv?accessType=DOWNLOAD"
rawCSV = http.get(csvDownloadUrl).body()
# parse the CSV (string) into a qri DataFrame
theData = dataframe.parse_csv(rawCSV)
# we can do filtering of the DataFrame and assign it back to its original variable
# filter for first names that start with 'V'
theData = theData[[x.startswith('V') for x in theData["Child's First Name"]]]
# each column in the DataFrame is a Series
# make a new `Series` with only the unique values
uniqueSeries = theData["Child's First Name"].unique()
# iterate over the Series and convert each string to lowercase
for idx, val in enumerate(uniqueSeries):
uniqueSeries[idx] = val.lower()
# sort the Series alphabetically
uniqueSeries = sorted(uniqueSeries)
# make an empty DataFrame, assign our Series to be a column named 'firstname'
# this will become the next version of our dataset's body
newBody = dataframe.DataFrame()
newBody['firstname'] = uniqueSeries
# get the previous version of this dataset
workingDataset = dataset.latest()
# set the body of the dataset to be our new body
workingDataset.body = newBody
# finally, commit the changes
# the last step of every transform is always `dataset.commit(Dataset)`
dataset.commit(workingDataset)
Figured out the root cause of this bug. The line workingDataset.body = newBody does not correctly copy the columns from newBody to the workingDataset object. Fix should be fairly straight-forward to make.