Analyzing Avocado Prices and Consumption in the U.S.

Nhi Ngo

In the past decade, avocados have soared in popularity. From avocado salad, to avocado toast, and even to avocado face masks, people seem to be buying more and more avocados and using it in many different ways. This rise in popularity has been linked to the wellness movement as more people are focusing on their health. If you want to learn more about the cultural history of avocados, check out this article by the BBC: https://www.bbc.co.uk/bbcthree/article/87a56e5c-6d41-4495-9e22-523efb6b4cb0#:~:text=1920s%3A%20A%20PR%20campaign%20starts&text=By%20the%20late%2019th%20century,but%20they%20weren't%20selling.

In this tutorial, we will use data science to analyze the trends in avocado prices and the number of avocados purchased.

Getting the Data

Let's look at this data from Timofei Kornev on Kaggle (https://www.kaggle.com/timmate/avocado-prices-2020). This data was collected from the Hass Avocado Board's website (https://hassavocadoboard.com/). It contains weekly scan data of Hass avocados from the years 2015 to 2020.

First, we'll read in the data and convert it to a dataframe.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# creating dataframe from data
df = pd.read_csv("avocado-updated-2020.csv", sep=',')
df.head()
Out[1]:
date average_price total_volume 4046 4225 4770 total_bags small_bags large_bags xlarge_bags type year geography
0 2015-01-04 1.22 40873.28 2819.50 28287.42 49.90 9716.46 9186.93 529.53 0.0 conventional 2015 Albany
1 2015-01-04 1.79 1373.95 57.42 153.88 0.00 1162.65 1162.65 0.00 0.0 organic 2015 Albany
2 2015-01-04 1.00 435021.49 364302.39 23821.16 82.15 46815.79 16707.15 30108.64 0.0 conventional 2015 Atlanta
3 2015-01-04 1.76 3846.69 1500.15 938.35 0.00 1408.19 1071.35 336.84 0.0 organic 2015 Atlanta
4 2015-01-04 1.08 788025.06 53987.31 552906.04 39995.03 141136.68 137146.07 3990.61 0.0 conventional 2015 Baltimore/Washington

Our dataframe contains many columns, but the columns we will be focusing on are:

  • date: date of the observation
  • average_price: the average price of a single avocado
  • total_volume: total number of avocados sold
  • type: whether or not the avocado is organically grown or conventionally grown
  • geography: the city or region of the observation

Before doing EDA, we should check if there are any missing data. Missing data can cause problems during EDA, so we need to figure out how to deal with them if they exist. To learn more about missing data and ways to handle them, read this https://www.mastersindatascience.org/learning/how-to-deal-with-missing-data/.

In [2]:
# checking if dataframe has any null values
df.isnull().values.any()
Out[2]:
False

The dataframe has no missing data so we can go ahead and start doing EDA.

Exploratory Data Analysis (EDA) and Visualization

Our dataframe is ready, so now we will do some data analysis.

First things first, let's say that you're an avid avocado consumer, where in the U.S. should you live if you don't want to break your bank by buying avocados? Using this data, we can find out where in the U.S. has the cheapest avocados and where has the most expensive avocados.

In [3]:
# finding the average price of avocados for each geographic location
avg_price_by_location = df.groupby(['geography'])['average_price'].mean().reset_index()
avg_price_by_location = avg_price_by_location.sort_values(by=['average_price'])
avg_price_by_location
Out[3]:
geography average_price
18 Houston 1.081817
11 Dallas/Ft. Worth 1.088201
45 South Central 1.114748
33 Phoenix/Tucson 1.224209
26 Nashville 1.226025
10 Columbus 1.230450
9 Cincinnati/Dayton 1.239191
39 Roanoke 1.243813
27 New Orleans/Mobile 1.247140
38 Richmond/Norfolk 1.258345
13 Detroit 1.262518
53 West Tex/New Mexico 1.266275
19 Indianapolis 1.272320
12 Denver 1.276403
23 Louisville 1.282068
22 Los Angeles 1.304353
1 Atlanta 1.312842
15 Great Lakes 1.318471
50 Tampa 1.321906
51 Total U.S. 1.329946
52 West 1.330090
34 Pittsburgh 1.335054
46 Southeast 1.342644
24 Miami/Ft. Lauderdale 1.355306
35 Plains 1.373633
21 Las Vegas 1.377788
44 South Carolina 1.379748
25 Midsouth 1.386241
31 Orlando 1.389424
16 Harrisburg/Scranton 1.400629
5 Buffalo/Rochester 1.410576
36 Portland 1.414730
20 Jacksonville 1.416906
49 Syracuse 1.430737
3 Boise 1.440072
6 California 1.444784
30 Northern New England 1.454964
41 San Diego 1.455594
14 Grand Rapids 1.456403
48 St. Louis 1.460647
2 Baltimore/Washington 1.481996
0 Albany 1.506187
47 Spokane 1.507590
4 Boston 1.529694
43 Seattle 1.535683
8 Chicago 1.535989
32 Philadelphia 1.543669
29 Northeast 1.549784
7 Charlotte 1.570450
37 Raleigh/Greensboro 1.573759
40 Sacramento 1.596583
28 New York 1.678309
17 Hartford/Springfield 1.770953
42 San Francisco 1.771871

The data has an entry for 'Total U.S.'. We can use the average price associated with 'Total U.S.' to see how far the avocado prices for the different locations is from the average.

In [4]:
# getting the total average price from 'Total U.S.'
average_total = \
avg_price_by_location.loc[avg_price_by_location['geography'] == 'Total U.S.','average_price'].\
iloc[0]

average_total
Out[4]:
1.3299460431654677
In [5]:
# dropping 'Total U.S.' from dataset
avg_price_by_location = avg_price_by_location[avg_price_by_location.geography != 'Total U.S.']
In [6]:
# plotting average avocado prices for each Geographic Location

plt.figure(figsize=(25, 15))
sns.barplot(x='geography',y='average_price', data=avg_price_by_location)
plt.title('Average Avocado Prices For Each Geographic Area', fontsize=23)

# making x-axis labels vertical to take less space
plt.xticks(rotation=90, size=18)

plt.yticks(size=18)
plt.xlabel('Geography', fontsize=20)
plt.ylabel('Average Price', fontsize=20)

plt.axhline(average_total)
Out[6]:
<matplotlib.lines.Line2D at 0x7fb37c6144f0>

The graph above shows the average avocado prices for each geographic location. The blue horizontal line represents the average avocado price in the U.S. From the graph, it looks like Houston, Dallas/Ft. Worth, and South Central are the three places with the cheapest avocado prices with prices ranging between \$1.08 and \\$1.12. On the other hand, New York, Hartfold/Springfield, and San Francisco are the three places with the most expensive avocado prices with prices ranging between \$1.68 to \\$1.77.

The graph above doesn't take into consideration if the avocados are organic or conventional. Organic and conventional avocados usually have different prices from each other, so we will find the average avocado prices for each geographic location with the type of avocado taken into consideration.

In [7]:
# finding the average for each avocado type and each geographic location
newdf = df.groupby(['geography','type'])['average_price'].mean().reset_index()
newdf
Out[7]:
geography type average_price
0 Albany conventional 1.314101
1 Albany organic 1.698273
2 Atlanta conventional 1.052410
3 Atlanta organic 1.573273
4 Baltimore/Washington conventional 1.341906
... ... ... ...
103 Total U.S. organic 1.560000
104 West conventional 1.030324
105 West organic 1.629856
106 West Tex/New Mexico conventional 0.878058
107 West Tex/New Mexico organic 1.658727

108 rows × 3 columns

In [8]:
# making a dataframe with only organic avocado prices
organic_newdf = newdf.loc[newdf['type'] == 'organic']
organic_newdf = organic_newdf.sort_values(by=['average_price'])

# making a dataframe with only conventional avocado prices
conventional_newdf = newdf.loc[newdf['type'] == 'conventional']
conventional_newdf = conventional_newdf.sort_values(by=['average_price'])


print('organic df\n', organic_newdf.head())
print('conventional df\n', conventional_newdf.head())
organic df
            geography     type  average_price
23  Dallas/Ft. Worth  organic       1.335647
37           Houston  organic       1.349964
91     South Central  organic       1.361547
27           Detroit  organic       1.410755
79           Roanoke  organic       1.414496
conventional df
                geography          type  average_price
66        Phoenix/Tucson  conventional       0.776115
36               Houston  conventional       0.813669
22      Dallas/Ft. Worth  conventional       0.840755
90         South Central  conventional       0.867950
106  West Tex/New Mexico  conventional       0.878058
In [9]:
# getting the average organic avocado price of 'Total U.S.'
organic_avg =  \
organic_newdf.loc[organic_newdf['geography'] == 'Total U.S.','average_price'].\
iloc[0]

organic_avg
Out[9]:
1.56
In [10]:
# getting the average conventional avocado price of 'Total U.S.'
conventional_avg = \
conventional_newdf.loc[conventional_newdf['geography'] == 'Total U.S.','average_price'].\
iloc[0]

conventional_avg
Out[10]:
1.0998920863309354
In [11]:
# dropping 'Total U.S.' from both datasets
organic_newdf = organic_newdf[organic_newdf.geography != 'Total U.S.']
conventional_newdf = conventional_newdf[conventional_newdf.geography != 'Total U.S.']
In [12]:
# plotting average organic avocado prices for each Geographic Location

plt.figure(figsize=(25, 15))
sns.barplot(x='geography',y='average_price', data=organic_newdf)
plt.title('Average Organic Avocado Prices For Each Geographic Area', fontsize=23)
# making x-axis labels vertical to take less space
plt.xticks(rotation=90, size=18)
plt.yticks(size=18)
plt.xlabel('Geography', fontsize=20)
plt.ylabel('Average Price', fontsize=20)

plt.axhline(organic_avg)
Out[12]:
<matplotlib.lines.Line2D at 0x7fb37961c400>
In [13]:
# plotting average conventional avocado prices for each Geographic location

plt.figure(figsize=(25, 15))
sns.barplot(x='geography',y='average_price', data=conventional_newdf)
plt.title('Average Conventional Avocado Prices For Each Geographic Area', fontsize=23)
# making x-axis labels vertical to take less space
plt.xticks(rotation=90, size=18)
plt.yticks(size=18)
plt.xlabel('Geography', fontsize=20)
plt.ylabel('Average Price', fontsize=20)

plt.axhline(conventional_avg)
Out[13]:
<matplotlib.lines.Line2D at 0x7fb37a2f2d90>

We can clearly see from the graphs above that organic avocados tend to be more expensive than conventional avocados. The highest organic avocado price we see is over \$2 while the highest conventional avocado price we see is around \\$1.4.

Most Expensive Avocados

From the graphs above, it looks like New York, Hartfold/Springfield, and San Francisco are still the three places with the most expensive avocados, organic and conventional. When it comes to organic avocados, San Francisco and Hartford/Springfield have a significantly higher average price compared to the entire U.S. with their prices looking like they're around \$2.50 while New York's average price is around \\$2. When it comes to conventional avocados, the average prices in New York, Hartfold/Springfield, and San Francisco are around the same with all of their prices being around \$1.4.

Cheapest Avocados

When it comes to cheapest organic avocados, Dallas/Ft. Worth, Houston, and South Central are still the top three. Their prices seem to be around \$1.30 to \\$1.35.

When it comes to cheapest conventional avocados, Phoenix/Tucson, Houston, and Dallas/Ft. Worth are the top three with prices looking like they range from \$0.81 to \\$0.85. South Central was the place with the third cheapest average price for organic avocados, but for conventional avocados it has the 4th cheapest price at around \$0.90.

Something that's interesting in the data is that Phoenix/Tucson is on the more expensive end when it comes to organic avocados; its average organic avocado price is around \$1.70, but it has the most cheapest price when it comes to conventional avocados.

Insights

It seems like avid avocado consumers should consider living in Dallas/Ft. Worth, Houston, and South Central because they consistently have cheaper avocado prices compared to other locations. They can also consider living in Phoenix/Tucson if they prefer to buy conventional avocados over organic avocados.

New York, Hartfold/Springfield, and San Francisco have much higher avocado prices than the average, so you may not be able to afford buying many avocados here.

We now know which places have the most and least expensive avocados.

Let's now find out how avocado prices and the number of avocados bought change over time for all geographic locations in the data. Our data contains weekly retail information so we can graph the prices over time and total number of avocados bought over time. We'll first do this for all avocados, disregarding if they're conventional or organic. Then we'll do this for organic avocados, and then for conventional avocados.

In [14]:
# changing date column of dataframe to datetime type
df['date']= pd.to_datetime(df['date'])

# getting the average price of all avocados depending on the date
new_df = df.groupby(df.date)['average_price'].mean().reset_index()
new_df
Out[14]:
date average_price
0 2015-01-04 1.301296
1 2015-01-11 1.370648
2 2015-01-18 1.391111
3 2015-01-25 1.397130
4 2015-02-01 1.247037
... ... ...
273 2020-04-19 1.386204
274 2020-04-26 1.385556
275 2020-05-03 1.304815
276 2020-05-10 1.329537
277 2020-05-17 1.371111

278 rows × 2 columns

In [15]:
# plotting average avocado prices over time

plt.figure(figsize=(25, 15))
sns.lineplot(x='date',y='average_price', data=new_df)
plt.title('Average Avocado Prices Over Time', fontsize=23)

plt.xticks(size=18)
plt.yticks(size=18)
plt.xlabel('Date', fontsize=20)
plt.ylabel('Average Price', fontsize=20)
Out[15]:
Text(0, 0.5, 'Average Price')
In [16]:
# getting the average number of avocados bought in all geographic locations over time
volume_over_time = df.groupby(df.date)['total_volume'].mean().reset_index()

# plotting number of avocados bought over time
plt.figure(figsize=(25, 15))
ax = sns.lineplot(x='date',y='total_volume', data=volume_over_time)
plt.title('Number of Avocados Sold Over Time', fontsize=23)

plt.xticks(size=18)
plt.yticks(size=18)
plt.xlabel('Date', fontsize=20)
plt.ylabel('Number of Avocados Sold', fontsize=20)

#getting rid of scientific notation
plt.ticklabel_format(style='plain', axis='y')
In [17]:
# splitting the dataframe into two dataframes based on type (organic or conventional)
organic_df = df[(df['type'] == 'organic')]
conventional_df = df[(df['type'] == 'conventional')]
In [18]:
# getting average price over time for organic avocados
organic_price_over_time = organic_df.groupby(organic_df.date)['average_price'].mean()\
.reset_index()
In [19]:
# getting average price over time for conventional avocados
conventional_price_over_time = conventional_df.groupby(conventional_df.date)['average_price']\
.mean().reset_index()
In [20]:
# plotting average organic prices over time

plt.figure(figsize=(25, 15))
sns.lineplot(x='date',y='average_price', data=organic_price_over_time)
plt.title('Average Organic Avocado Prices Over Time', fontsize=23)

plt.xticks(size=18)
plt.yticks(size=18)
plt.xlabel('Date', fontsize=20)
plt.ylabel('Average Price', fontsize=20)
Out[20]:
Text(0, 0.5, 'Average Price')
In [21]:
# getting the average number of organic avocados bought over time
organic_volume_over_time = organic_df.groupby(organic_df.date)['total_volume'].mean().reset_index()

#plotting number of organic avocados sold over time
plt.figure(figsize=(25, 15))
sns.lineplot(x='date',y='total_volume', data=organic_volume_over_time)
plt.title('Number of Organic Avocados Sold Over Time', fontsize=23)

plt.xticks(size=18)
plt.yticks(size=18)
plt.xlabel('Date', fontsize=20)
plt.ylabel('Number of Avocados Sold', fontsize=20)

#getting rid of scientific notation
plt.ticklabel_format(style='plain', axis='y')
In [22]:
# plotting average conventional avocado price over time

plt.figure(figsize=(25, 15))
sns.lineplot(x='date',y='average_price', data=conventional_price_over_time)
plt.title('Average Conventional Avocado Prices Over Time', fontsize=23)

plt.xticks(size=18)
plt.yticks(size=18)
plt.xlabel('Date', fontsize=20)
plt.ylabel('Average Price', fontsize=20)
Out[22]:
Text(0, 0.5, 'Average Price')
In [23]:
# getting the average number of conventional avocados bought over time
conventional_volume_over_time = conventional_df.groupby(conventional_df.date)['total_volume']\
.mean().reset_index()

#plotting number of conventional avocados sold over time
plt.figure(figsize=(25, 15))
sns.lineplot(x='date',y='total_volume', data=conventional_volume_over_time)
plt.title('Number of conventional Avocados Sold Over Time', fontsize=23)

plt.xticks(size=18)
plt.yticks(size=18)
plt.xlabel('Date', fontsize=20)
plt.ylabel('Number of Avocados Sold', fontsize=20)

#getting rid of scientific notation
plt.ticklabel_format(style='plain', axis='y')

Insights

The average avocado prices seem to vary over time. From the years 2015 to the beginning of 2016, average avocado prices were pretty low with prices ranging between \$1.10 to \\$1.50. Then there was a spike in prices from the middle of 2016 to the end of 2016. There was another spike in avocado prices from 2017 to 2018 and from 2019 to 2020. In between spikes, the prices seem to dip low again. The average price for both conventional and organic avocados seem to follow this trend.

When it comes to number of avocados bought over time, it looks like it's increasing over time with organic avocados seeing a clearer/higher increase over time compared to conventional avocados. This could be the case because more and more people in the U.S. are focusing on living a healthy lifestyle, so more organic avocados are being bought than before because they seem healthier than conventional avocados.

Predicting Number of Avocados Bought

It looks like more and more avocados are being bought in the U.S., but is there a relation between the prices of avocados and the number of avocados being bought? We can try to predict the number of avocados based on price by using machine learning.

First let's prep the data.

In [24]:
# getting average price and average total volume
avg_price_over_time = df.groupby(df.date)['average_price'].mean().reset_index()
volume_over_time = df.groupby(df.date)['total_volume'].mean().reset_index()
In [25]:
# merging average price and average total volume into one dataframe
merged_df = avg_price_over_time
merged_df = pd.merge(merged_df, volume_over_time,on='date',how='outer')

merged_df
Out[25]:
date average_price total_volume
0 2015-01-04 1.301296 7.840216e+05
1 2015-01-11 1.370648 7.273686e+05
2 2015-01-18 1.391111 7.258221e+05
3 2015-01-25 1.397130 7.080211e+05
4 2015-02-01 1.247037 1.106048e+06
... ... ... ...
273 2020-04-19 1.386204 1.279173e+06
274 2020-04-26 1.385556 1.326299e+06
275 2020-05-03 1.304815 1.572185e+06
276 2020-05-10 1.329537 1.489704e+06
277 2020-05-17 1.371111 1.318729e+06

278 rows × 3 columns

In [26]:
# making a scatterplot of average price vs average total volume
sns.scatterplot(data=merged_df,x='average_price',y='total_volume')
plt.xlabel('Price')
plt.ylabel('Volume')
plt.title('Avocado Prices vs Volume/Number Bought')
Out[26]:
Text(0.5, 1.0, 'Avocado Prices vs Volume/Number Bought')

From the scatterplot, it looks like number of avocados bought tend to decrease when the price increases.

We will use linear regression to see if we can find a predictive relationship between avocado price and number of avocados bought. If you want to learn how to do linear regression using sklearn, use this link: https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html

In [27]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
In [28]:
#Splitting data into training and testing datasets
X_train, X_test, y_train, y_test = train_test_split(merged_df.drop(['total_volume','date'], axis='columns'), \
                                                   merged_df.total_volume, test_size=0.2)
In [29]:
# doing linear regression
lg = LinearRegression()
lg.fit(X_train, y_train)
print('score: ', lg.score(X_test, y_test))
score:  0.28142537037501625

Our regression model score is very low, which means that our model isn't very accurate. Let's see what the regression looks like.

In [30]:
# graphing the linear regression

prediction = lg.predict(merged_df.average_price.values.reshape(-1,1))

plt.figure(figsize=(10,8))
plt.plot(merged_df.average_price, prediction, label='Linear Regression')
plt.scatter(merged_df.average_price, merged_df.total_volume, color='black')
plt.title('Average Avocado Price vs Total Volume of Avocados Bought')
plt.xlabel('Average Price')
plt.ylabel('Total Volume/Number of Avocados Bought')
Out[30]:
Text(0, 0.5, 'Total Volume/Number of Avocados Bought')
In [31]:
# coefficient of determination, a 1 is a perfect prediction
print('coefficient of determinations: ', r2_score(merged_df.total_volume, prediction))
coefficient of determinations:  0.2266479665513883

The regression line doesn't look like it fits that well either and our $r^2$ value/coefficient of determination also isn't that high, which means that the model doesn't fit our data well.

We can try polynomial regression to see if it fits our data better. Read this to learn more about polynomial regression: https://towardsdatascience.com/polynomial-regression-bbe8b9d97491.

In [32]:
from sklearn.preprocessing import PolynomialFeatures

# polynomial regression using a degree of 2
poly = PolynomialFeatures(degree=2)
x_poly = poly.fit_transform(X_train)

lg.fit(x_poly,y_train)
Out[32]:
LinearRegression()
In [33]:
# plotting the regression

prediction = lg.predict(poly.fit_transform(merged_df.average_price.values.reshape(-1,1)))

plt.figure(figsize=(10,8))
plt.plot(merged_df.average_price.values.reshape(-1,1), prediction, label=' Regression')
plt.scatter(merged_df.average_price, merged_df.total_volume, color='black')
plt.title('Average Avocado Price vs Total Volume of Avocados Bought')
plt.xlabel('Average Price')
plt.ylabel('Total Volume/Number of Avocados Bought')
Out[33]:
Text(0, 0.5, 'Total Volume/Number of Avocados Bought')
In [34]:
# coefficient of determination, a 1 is a perfect prediction
print('coefficient of determinations: ', r2_score(merged_df.total_volume, prediction))
coefficient of determinations:  0.24957235731778293

The above seems a bit better and our $r^2$ increased a bit, but it's still not great. Let's try a degree of 3 to see if it'll fit our model even better.

In [35]:
# polynomial regression using degree = 3
poly = PolynomialFeatures(degree=3)
x_poly = poly.fit_transform(X_train)

lg.fit(x_poly,y_train)

prediction = lg.predict(poly.fit_transform(merged_df.average_price.values.reshape(-1,1)))

# plotting the regression
plt.figure(figsize=(10,8))
plt.plot(merged_df.average_price.values.reshape(-1,1), prediction, label=' Regression')
plt.scatter(merged_df.average_price, merged_df.total_volume, color='black')
plt.title('Average Avocado Price vs Total Volume of Avocados Bought')
plt.xlabel('Average Price')
plt.ylabel('Total Volume/Number of Avocados Bought')
Out[35]:
Text(0, 0.5, 'Total Volume/Number of Avocados Bought')
In [36]:
# coefficient of determination, a 1 is a perfect prediction
print('coefficient of determinations: ', r2_score(merged_df.total_volume, prediction))
coefficient of determinations:  0.28631737222139697

Our $r^2$ improved a bit more. Now let's try a degree of 4.

In [37]:
# polynomial regression with degree = 4
poly = PolynomialFeatures(degree=4)
x_poly = poly.fit_transform(X_train)

lg.fit(x_poly,y_train)

prediction = lg.predict(poly.fit_transform(merged_df.average_price.values.reshape(-1,1)))

# plotting the regression
plt.figure(figsize=(10,8))
plt.plot(merged_df.average_price.values.reshape(-1,1), prediction, label=' Regression')
plt.scatter(merged_df.average_price, merged_df.total_volume, color='black')
plt.title('Average Avocado Price vs Total Volume of Avocados Bought')
plt.xlabel('Average Price')
plt.ylabel('Total Volume/Number of Avocados Bought')
Out[37]:
Text(0, 0.5, 'Total Volume/Number of Avocados Bought')
In [38]:
# coefficient of determination, a 1 is a perfect prediction
print('coefficient of determinations: ', r2_score(merged_df.total_volume, prediction))
coefficient of determinations:  0.2870786886861463

Our $r^2$ improved again but by a very small amount. If we keep increasing the degree, our model will be able to make better predictions with our data. However, doing that will lead to over-fitting the data. This will cause our model to predict this dataset of ours very well but it will fail to predict data that it hasn't seen before.

Our above models don't have a very high $r^2$ value so we most likely cannot accurately predict how many avocados are bought based only on price.

Conclusion

After analyzing and visualizing the data, we came up with some insights:

  • Dallas/Ft. Worth, Houston, and South Central seem to have cheaper avocado prices compared to other locations in the U.S. while New York, Hartfold/Springfield, and San Francisco have more expensive avocado prices. Phoenix/Tucson also has low prices for conventional avocados, but their organic avocados are pretty expensive. We can conclude that Dallas/Ft. Worth, Houston, and South Central are the best places to live if you don't want to go bankrupt from buying avocados

  • Avocado prices vary over time. Sometimes the prices spike and other times their prices dip low.

  • The number of avocados being bought over time is increasing. Even though both organic and conventional avocado consumption are increasing, organic avocados seem to be having a more consistent and higher increase in consumption than conventional avocados.

  • When plotting avocado prices vs number of avocados bought, it looks like less avocados are bought when the prices are high. However, we cannot accurately predict how many avocados are bought based solely on the price.

Overall, it seems like avocados are not getting any less popular. However, if we want to predict how many avocados will be bought, we need to take into consideration more parameters than just the price of the avocados.