pandas_exercises icon indicating copy to clipboard operation
pandas_exercises copied to clipboard

02: Filtering and Sorting/Chipotle, Step-4 & 5

Open realyashnag opened this issue 6 years ago • 7 comments

In 02-Filtering_and_Sorting/Chipotle, step 4 and 5,

  • Solution doesn't consider items which do not have a 'quantity'==1 in the data
  • They can be extracted by chipo['item_price'] = chipo['item_price']/chipo['quantity']
    chipo['quantity'] = 1 #Dividing item_price by quantity, therefore let quantity be 1
    chipo.drop_duplicates(['item_name'], keep='first', inplace=True)
    chipo.sort_values(by='item_price', ascending=False, inplace=True)
    display(chipo[['item_name', 'item_price']])

I'm also a beginner at Pandas, please let me know about any stupid thing that I missed. Thanks.

realyashnag avatar Oct 19 '18 16:10 realyashnag

I agree with you. I think the answer provide is simply wrong.

bobfang1992 avatar Dec 05 '18 20:12 bobfang1992

I agree too. However, to get the number of products costing more than $10.00, I believe you could use a simpler command: chipo.loc[chipo['item_price']/chipo['quantity'] > 10, 'item_name'].drop_duplicates().shape[0] Hope this helps other newcomers.

maticalderini avatar Jan 22 '19 14:01 maticalderini

I came up with the same issue. The question needs to be reconsidered, or the proposed solution changed. I believe the number of unique products with a price higher than 10 is 31. I use this code, after changing the column to float: chipo[chipo['item_price']>10]['item_name'].nunique()

bromero26 avatar Feb 10 '19 15:02 bromero26

@bromero26 I get the same answer as you using a slightly more verbose method:

min_max_price_per_item = chipo.groupby('item_name').agg({'item_price': [np.max, np.min]})
min_max_price_per_item[min_max_price_per_item.item_price.amax > 10].shape[0]

But all of those high prices are caused by extras or specific configurations. If you stick to the basics, any item can be had for less than $10:

min_max_price_per_item[min_max_price_per_item.item_price.amin > 10].shape[0]

The question is not well-formed. It could be asking:

  1. Which products did at least one person order for more than $10? (A: 31)
  2. Which products always cost at least $10, regardless of choice_description? (A: 0)
  3. Which product combinations (combination of item_name and choice_description) cost at least $10? (A: 777)

rahimnathwani avatar Mar 05 '19 06:03 rahimnathwani

@rahimnathwani I agree with you, the question is not well-formed. Still, I believe that @matiascalderini suggestion is correct since price vs quantity seems quite linear. Check using water bottles as an example:

(
    chipo.query('item_name == "Bottled Water"')[['quantity', 'item_price']]
    .groupby('quantity')
    .agg(['mean', 'std'])
    .item_price
    .reset_index()
    .plot(x='quantity', y='mean', yerr='std', kind='scatter')
)

Normalizing the cost I get these values:

chipo['price_per_item'] = chipo.item_price/chipo.quantity
A1 = chipo.query('price_per_item > 10').item_name.nunique()
A2 = (chipo.groupby('item_name').price_per_item.min()>10).sum()
chipo['name_with_variants'] = chipo.item_name+chipo.choice_description
A3 = (chipo.groupby('name_with_variants').price_per_item.min()>10).sum()

print(f'A1:{A1}, A2:{A2}, A3:{A3}')

A1:25, A2:0, A3:707

AndreaAmico avatar May 24 '19 08:05 AndreaAmico

Hi everyone, thank you for the comments and feedback. I agree that this question is not so clear too.

Some clarifications:

  1. There is a clear distinction of order_id, quantity and product. Example, in a same order_id, you can ask a product in a quantity greater than 1, which will influence the price.

Example: order _id | quantity | item_name | choice_description | item_price 9 | 2 | Canned Soda | [Sprite] | $2.18 14 | 1 | Canned Soda | [Dr. Pepper] | $1.09

Canned Soda costs $1.09. If I buy 10 sodas, the line will show up $10.90, which is greater than $10, but that doesn't mean that the product Canned Soda costs more than $10.

That is the reason that quantity needs to be considered for this exercise.

  1. In order to simplify the exercise take the combination of item_name + choice_description as "one product".

Example:

order _id | quantity | item_name | choice_description | item_price 12 | 1 | Chicken Burrito | [[Tomatillo-Green Chili Salsa (Medium), Tomati... | $10.98 8 | 1 | Chicken Burrito | [Tomatillo-Green Chili Salsa (Medium), [Pinto ... | $8.49

"Chicken Burrito" is the "main" product but depending on the additional items it will cost more or less than $10, so to simplify take the combination item_name + choice_description as "one product".

Considering that what is your suggestion? Send me a PR! 😉

guipsamora avatar Oct 13 '19 19:10 guipsamora

In the getting and knowing your data part ,the url donnot work.what should I do?

newera-001 avatar Jun 15 '20 08:06 newera-001