Appendix - Bike Trips Analysis
Prepare
The data was collected by Motivate International Inc., a bike rental company.
The data is located in the cloud. Link to data
The data is publicly available and collected in Chicago.
The data is considered 1st party and is considered trustworthy.
The data is stored in zip-files, one file per file. The unzipped folder contains a MS Excel CSV and a CVS file modified for Mac OS.
The dataset is comprised of large files, with the largest one being approximately 150GB in size.
The data analyzed in this study covers the period from March 2022 to February 2023.
The tables consist of 13 columns: ride_id, ridablee_type, started_at, ended_at, start_station_name, start_station_id end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng, member_casual
Personal information such as credit card numbers has been removed to ensure privacy.
One limitation of the data is that it's not possible to determine how many of the casual rides were taken by the same user, as the data does not include unique identifiers for individual riders.
Process
Started off working with the data in spreadsheets, but found that it was too much for the table and was loading slowly.
Calculated ride length for each trip.
Observed that there are many rides that are only a few seconds long.
Found that there are rides longer than 24 hours, even though the maximum ride length is supposed to be 24 hours.
Discovered that there are rides with missing start and/or end stations.
Added a weekday column where 1 represents Monday and 7 represents Sunday.
Cleaning in spreadsheets
2022-03
284042 rows
Two rows had the start and end time mixed up. Ride id 2D97E3C98E165D80 and 7407049C5D89A13D. Changed the placement of the entries.
263 bike rides longer then 24h, removed them.
3108 bike rides less then 1 minute, removed them.
280673 rows left
2022-04
371249 rows
303 bike rides longer then 24h, removed them.
4600 bike rides less then 1 minute, removed them.
366346 rows left
2022-05
634858 rows
One row had the start and end time mixed up. Ride id 0793C9208A64302A. Changed the placement of the entries.
662 bike rides longer then 24h, removed them.
7474 bike rides less then 1 minute, removed them.
626722 rows left
2022-06
769204 rows
13 rides had the start and end time mixed up. Ride id 38B9F148CE80499B, B897BE02B21FA75E, BF114472ABA0289C, 072E947E156D142D, BBD84670E05463A9, F017ABAD58857045, 9FE1805A2557C74A, EF3CCF2B05635B96, B3BE8FE661F79405, 94B4428CC44303EF, 4293B33E61D0C951, 6A871510E302E8CD and B85E89FC9DD2558D. Changed the placement of the entries.
977 bike rides longer then 24h, removed them.
9099 bike rides less then 1 minute, removed them.
759128 rows left
2022-07
823488 rows
16 rides with the start and end time mixed up. Ride id D3E7C0B68EFEC32B, 4AD54A7AD0C5BF62, C1D6D749139CB6C0, 029D853B5C38426E, 89A7C16F29CD71D5, 3FC63EA6E83B609D, 48EA91B86A4220BD, A9E0DA69675D6A16, E3650AA02C01F218, B2B4386C03C39C6B, 61B92B5FF938674F, 461CC55C9B00468B, C3B9304C7EEE461A, 700D05389993E337, 035C91D5B31A0E17 and CE7E7DE0E3F22E46. Changed the placement of the entries.
859 bike rides longer then 24h, removed them.
10268 bike rides less then 1 minute, removed them.
812361 rows left
2022-08
785932 rows
15 rides with the start and end time mixed up. Ride id EC54018617CC3AE7, AD5D938B278C2D58, A2991D490436A806, A13F8B7BD207AF09, 93A74DBF339CC818, E2F6294CE68E07AA, C0C8DB0E2EA5B9E4, 3EE589C3B904CD2C, 953BF4B707AEE983, 0DB781397E2287B7, 3D99C9D3AB75E23F, F33D11F3AF0F522E, BD0B84E552536029, EF40941641F68C81 and 179F7874064251C8. Changed the placement of the entries.
734 bike rides longer then 24h, removed them.
10170 bike rides less then 1 minute, removed them.
775029 rows left
2022-09
701340 rows
9 rides with the start and end time mixed up. Ride id E137518FFE807752, 57D8340BB2C0E7B0, 47E70E6F8939C541, 4890C455C466940F, 912DE8D91E0AE083, 871B757480E7208F, D1FF403E09EC995E, 5387FB2D8D6AC0F3 and BB630AE8E6017B58. Changed the placement of the entries.
628 bike rides longer then 24h, removed them.
8704 bike rides less then 1 minute, removed them.
692008 rows left
2022-10
558685 rows
4 rides with start and end time mixed up. Ride id F9A1F8F99C1EFBF5, 5ACA942162006249, 918F745F62CAC29E and D948CEECBD356EAB. Changed the placement of the entries.
422 bike rides longer then 24h, removed them.
7593 bike rides less then 1 minute, removed them.
550670 rows left
2022-11
337735 rows
41 rides with the start and end time mixed up. Ride id 1BA46F9F216F5E17, B5602D5BB3D517F6, 75780B25C9DF4DB1, 2A533E57397B5D4E, DFC82699B14E4206, EA0AC18A7210CFD8, D0777AB1FF817CCE, B9EE6C73F013FEA5, 42AA6484DDE16CAA, 30BA9E7F8225D0B1, E94F1FD1306C0F9D, B1E7F86FF2CAF18E, 5FBFC1FD44AD3467, A486387C46E26BE4, 709F4CC28BC5B9D6, 1BA8DEBE4E13DD1D, BC580FF9E5188FE2, 4139B11634039661, 2D98008FFB28C1B8, 417746CBEB92A34E, 35D7B0CC880D577B, 112ED5B9200BFD2A, FB90B13B85686836, 309BA8741F370C2B, F591E5F1CEB97D1D, 5B15C685C741E678, AF62B881CEBBC69F, C055408DBE44D1C1, 74231213FDA5EA03, 0C87ABEF3511C8C7, 77417E7D945B4D32, 2F7E2E2160BD54FF, C7141D479EB80507, 8E63B36510B38694, 4261E8D76BBC7807, 8BC7D5602F31D869, 7598EB707107648D, 9709ACD5F0E8222D, 72C7B4E6882492B1, 71940A5D4B8C92D9 and 7A119D7D10AABFB9. Changed the placement of the entries.
216 bike rides longer then 24h, removed them.
4517 bike rides less then 1 minute, removed them.
333002 rows left
2022-12
181806 rows
122 bike rides longer then 24h, removed them.
2890 bike rides less then 1 minute, removed them.
178795 rows left
2023-01
190301 rows
118 bike rides longer then 24h, removed them.
3811 bike rides less then 1 minute, removed them.
186372 rows left
2023-02
190445 rows
One ride with the start and end time mixed up. Ride id 4EFC95304E050AA1. Changed the placement of the entries.
109 bike rides longer then 24h, removed them.
3781 bike rides less then 1 minute, removed them.
186555 rows left
Analyze
Analyzing the data using spreadsheets
Calculated the total number of rides per month.
Determined the proportion of annual members vs. casual riders, both in terms of the total number of rides and on a monthly basis.
Calculated the average trip duration for both casual riders and annual members.
Identified the maximum and minimum ride lengths (note: there was no difference between maximum and minimum ride length for the dataset as a whole).
Calculated the percentage increase in the number of rides from one month to the next.
Analyzed at what times of day rides tended to start.
Analyzing the data using BigQuery
Calculated most frequent day of the week for casual riders and annual members.
Identified the most frequent stations for casual riders and annual members.
Determined the types of bikes used by casual riders and annual members (note: there was no difference between casual and annual members in terms of the types of bikes used).
Analyzed the duration of the trips taken by casual riders and annual members.
Queries
## What kind of bikes are casual and members using?
SELECT member_casual,
rideable_type,
COUNT(*) AS number_of_bikes
FROM `sonic-diorama-378511.Cyclistic_trip_data.202302`
GROUP BY rideable_type, member_casual
ORDER BY member_casual
## Rides per member/casual and weekday
SELECT member_casual,
weekday,
COUNT(*) AS bikes_per_weekday
FROM `sonic-diorama-378511.Cyclistic_trip_data.202302`
GROUP BY member_casual, weekday
## Most popular start stations
SELECT COUNT(*) AS number_of_rides,
start_station_name
FROM `sonic-diorama-378511.Cyclistic_trip_data.202302`
WHERE member_casual = "casual"
GROUP BY start_station_name
ORDER BY number_of_rides DESC
## Most popular end stations
SELECT COUNT(*) AS number_of_rides,
end_station_name
FROM `sonic-diorama-378511.Cyclistic_trip_data.202302`
WHERE member_casual = "casual"
GROUP BY end_station_name
ORDER BY number_of_rides DESC
## The duration of the rides
## WHERE ride_length < "00:15:00"
## WHERE "00:15:00" <= ride_length AND ride_length < "00:30:00"
## WHERE "00:30:00" <= ride_length AND ride_length < "01:00:00"
## WHERE "01:00:00" <= ride_length
SELECT COUNT(*),
member_casual,
FROM `sonic-diorama-378511.Cyclistic_trip_data.202302`
WHERE "01:00:00" <= ride_length
GROUP BY member_casual