This document demonstrates my approach analyzing the dataset in the Uber Analytics Exercise.
To begin the analysis, we load the .csv file into R workspace and check its structure and summary.
## 'data.frame': 336 obs. of 7 variables:
## $ Date : Date, format: "2012-09-10" "2012-09-10" ...
## $ Time..Local. : int 7 8 9 10 11 12 13 14 15 16 ...
## $ Eyeballs : int 5 6 8 9 11 12 9 12 11 11 ...
## $ Zeroes : int 0 0 3 2 1 0 1 1 2 2 ...
## $ Completed.Trips: int 2 2 0 0 4 2 0 0 1 3 ...
## $ Requests : int 2 2 0 1 4 2 0 0 2 4 ...
## $ Unique.Drivers : int 9 14 14 14 11 11 9 9 7 6 ...
The data could be viewed as a funnel from Eyeballs(demand emerges) to Completed Trips, while Zeros result in potential drop-outs(turn off the App). Visualziation helps us discover the pattern of these metrics easier than contingency table.
The graph confirmed our previous funnel assumption that the distribution of Eyeballs, Requests and Completed Trips are slightly higher than their next ones.
By drawing a scatter plot matrix, we can identify if there is any correlations between all the numeric variables.
It looks like there is a positive correlation between demand(Eyeballs) and supply(Unique Drivers) based on the plot. So we will further calculate the correlation coefficient. A correlation coefficient of 0.79 suggests strong positive relationship between supply and demand.
cor(uberNum$Eyeballs, uberNum$Unique.Drivers)
## [1] 0.7895826
After examining numeric variables solely, we will analyze them by categorical groups, that, in this case, are Date and Time.
It is clearly that completed trips centralize in weekends(Saturday and Sunday) and in peak hours(5pm - 3am), according to the graph.
The core concept of the business is about the optimization between demand(Eyeballs) and supply(Unique Drivers). So we are going to explore the pattern of gap (uber$Eyeballs-uber$Unique.Drivers
) by date and time. It is clearly that there are still lots of positive bars, especially in Friday night and Saturday.
Zeroes might directly result in drop-outs(turning off the App.) Besides watching patterns of Eyeballs and Online Drivers, we also have to pay attention to zeroes, especially in Popular Hours. According to the exercise, Popular Hours are defined as 5pm in Friday to 3am in Sunday. And there are still plenty of Zeroes during Popular Hours according to the graph.
For a more detailed learning for Uber Analytics, you could visit https://www.deskbright.com/uber/uber-analytics-test/).