pandas_exercises
pandas_exercises copied to clipboard
02: Filtering and Sorting/Chipotle, Step-4 & 5
In 02-Filtering_and_Sorting/Chipotle, step 4 and 5,
- Solution doesn't consider items which do not have a 'quantity'==1 in the data
- They can be extracted by
chipo['item_price'] = chipo['item_price']/chipo['quantity']
chipo['quantity'] = 1 #Dividing item_price by quantity, therefore let quantity be 1
chipo.drop_duplicates(['item_name'], keep='first', inplace=True)
chipo.sort_values(by='item_price', ascending=False, inplace=True)
display(chipo[['item_name', 'item_price']])
I'm also a beginner at Pandas, please let me know about any stupid thing that I missed. Thanks.
I agree with you. I think the answer provide is simply wrong.
I agree too. However, to get the number of products costing more than $10.00, I believe you could use a simpler command:
chipo.loc[chipo['item_price']/chipo['quantity'] > 10, 'item_name'].drop_duplicates().shape[0]
Hope this helps other newcomers.
I came up with the same issue. The question needs to be reconsidered, or the proposed solution changed. I believe the number of unique products with a price higher than 10 is 31. I use this code, after changing the column to float:
chipo[chipo['item_price']>10]['item_name'].nunique()
@bromero26 I get the same answer as you using a slightly more verbose method:
min_max_price_per_item = chipo.groupby('item_name').agg({'item_price': [np.max, np.min]})
min_max_price_per_item[min_max_price_per_item.item_price.amax > 10].shape[0]
But all of those high prices are caused by extras or specific configurations. If you stick to the basics, any item can be had for less than $10:
min_max_price_per_item[min_max_price_per_item.item_price.amin > 10].shape[0]
The question is not well-formed. It could be asking:
- Which products did at least one person order for more than $10? (A: 31)
- Which products always cost at least $10, regardless of choice_description? (A: 0)
- Which product combinations (combination of item_name and choice_description) cost at least $10? (A: 777)
@rahimnathwani I agree with you, the question is not well-formed. Still, I believe that @matiascalderini suggestion is correct since price vs quantity seems quite linear. Check using water bottles as an example:
(
chipo.query('item_name == "Bottled Water"')[['quantity', 'item_price']]
.groupby('quantity')
.agg(['mean', 'std'])
.item_price
.reset_index()
.plot(x='quantity', y='mean', yerr='std', kind='scatter')
)
Normalizing the cost I get these values:
chipo['price_per_item'] = chipo.item_price/chipo.quantity
A1 = chipo.query('price_per_item > 10').item_name.nunique()
A2 = (chipo.groupby('item_name').price_per_item.min()>10).sum()
chipo['name_with_variants'] = chipo.item_name+chipo.choice_description
A3 = (chipo.groupby('name_with_variants').price_per_item.min()>10).sum()
print(f'A1:{A1}, A2:{A2}, A3:{A3}')
A1:25, A2:0, A3:707
Hi everyone, thank you for the comments and feedback. I agree that this question is not so clear too.
Some clarifications:
- There is a clear distinction of
order_id
,quantity
andproduct
. Example, in a sameorder_id
, you can ask aproduct
in aquantity
greater than 1, which will influence the price.
Example: order _id | quantity | item_name | choice_description | item_price 9 | 2 | Canned Soda | [Sprite] | $2.18 14 | 1 | Canned Soda | [Dr. Pepper] | $1.09
Canned Soda costs $1.09. If I buy 10 sodas, the line will show up $10.90, which is greater than $10, but that doesn't mean that the product Canned Soda costs more than $10.
That is the reason that quantity needs to be considered for this exercise.
- In order to simplify the exercise take the combination of
item_name + choice_description
as "one product".
Example:
order _id | quantity | item_name | choice_description | item_price 12 | 1 | Chicken Burrito | [[Tomatillo-Green Chili Salsa (Medium), Tomati... | $10.98 8 | 1 | Chicken Burrito | [Tomatillo-Green Chili Salsa (Medium), [Pinto ... | $8.49
"Chicken Burrito" is the "main" product but depending on the additional items it will cost more or less than $10, so to simplify take the combination item_name + choice_description
as "one product".
Considering that what is your suggestion? Send me a PR! 😉
In the getting and knowing your data part ,the url donnot work.what should I do?