The URL can be used to link to this page

MATH 456 Student Project Reports for ValleyBike Operations OptimiUniversity of Massachusetts Amherst University of Massachusetts Amherst ScholarWorks@UMass Amherst ScholarWorks@UMass Amherst Student Showcase Sustainable UMass 2019 MATH 456 Student Project Repor ts for ValleyBike Operations MATH 456 Student Project Repor ts for ValleyBike Operations Optimization Optimization Ezra Small University of Massachusetts - Amherst, esmall@facil.umass.edu Follow this and additional works at: https://scholarworks.umass.edu/ sustainableumass_studentshowcase Part of the Analysis Commons, and the Sustainability Commons This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License. Small, Ezra, "MATH 456 Student Project Reports for ValleyBike Operations Optimization" (2019). Student Showcase. 27. Retrieved from https://scholarworks.umass.edu/sustainableumass_studentshowcase/27 This Article is brought to you for free and open access by the Sustainable UMass at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Student Showcase by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact scholarworks@library.umass.edu. Financial Analysis of an Incentive Program for Valley Bike Taylor Stetson, Neal Bachmann, Matthew Geiger, Minh Nguyen December 16, 2019 1 Introduction 1.1 Goal Our goal was to optimize the nancial aspect of Valley Bike. As Valley Bike is a non-prot, we knew that their prot would eventually be zero, as any extra money they make will be reinvested. For the convenience of our analysis, we de- cided to ignore the reinvestment and focus on how to optimize their operational prots. 1.2 Why we chose bike angels In order to optimize Valley Bike’s prots, we would have to increase their rev- enue or decrease their cost. There were a few areas of potential improvement within these, but we identied rebalancing costs as an outlier that could be greatly improved. Based on the work of other bike sharing systems, the most eective way to minimize rebalancing costs is to implement a \Bike Angels" program. Since this program seemed to be so eective at other companies, we decided to create a model which would estimate how eective this program would be for Valley Bike. Bike Angels is a potential incentive program for Valley Bike riders. Cur- rently, the company has to constantly move bikes from station to station in order to prevent \out-of-stock" events, which means that the stations are full/empty when riders want to return/rent a bike. The Bike Angels program gives out incentives for riders to take bikes to stations other than their desired station and walk an extra distance back in order to help rebalance the bikes. The goal is to reduce the rebalancing cost and distribute that back to riders. 1.3 Key Terms Before we can begin discussing our model, it is important to understand a few key terms: 1 Angel Rates: The probability that riders will participate in the Bike An- gels program on any given ride Eectiveness: The probability that a given bike angel redistribution will be more cost eective than a traditional redistribution Savings: Redistribution cost angel rates number of rides. Static Model: An implementation of bike angels in which incentives pro- vided to angels are static and unable to be adapted during the course of a day. This will be the model that Valley Bike has to implement at rst because they do not have a bike angels system currently in place to draw data from. 2 Current Rebalancing Costs We had two methods for estimating the cost. The rst method is top-down, meaning that we use Valley Bike’s projected operational cost to nd rebalanc- ing cost, based on their own nancial projections from [2]. We believe that their current cost ts the 50 stations/450 bikes size, which translates to about $826,200 of operational costs per year. Given that Valley Bike has roughly 6 categories of cost and 9 months of operation, we calculated the rebalancing cost to be $826,200 15%1/9 = $13,770/month. We believe that this is a rather reliable number, but we also veried it by conducting a bottom-up calculation. For our bottom-up approach, we calculated the redistribution cost based on the number of rebalancing trucks in operation. We knew they had three trucks and three full-time or full-time equivalent drivers. 3 drivers $15 per hour 40 2 hours per week 4 weeks per month = $7,200 monthly. For the trucks, we have 3 trucks * $2.50 per gallon * 1 gallon per hour * 40 hours per week * 4 weeks per month = 1,200 monthly. The maintenance and depreciation costs are at least $1500 per month. Overall, we have a lower bound for monthly cost at roughly 10,000. Since our estimate is a lower bound, the top-down cost we calculated should be accurate. Our conclusion is that the top-down approach based on Valley Bike’s own projections is reasonable enough. 3 Clustering One of the major problems for Bike Angels is nding stations within a distance that an angel would be willing to walk. Since people are generally not willing to walk very far, we decided that it would be a waste of resources to calculate the odds that someone would walk between stations that are very far apart. Rather than manually calculate the distance between every single station, we chose to use K-Means Clustering to divide the stations into clusters. Essentially the way that this works is that a program places k number of centroids on a map. It then creates clusters by identifying the nearest locations to these centroids. It repeats this process with various centroids until the clusters begin to stabilize. For our purposes we started with only 5 centroids but we found that the clusters that were being produced still had a lot of stations that would very rarely be walked to. We kept raising this number in order to eliminate extraneous stations. Eventually we settled on 20 clusters, of which we focused on the ve which contained the most closely grouped stations. 4 Probabilities One of the rst steps in nding the angel rate is to calculate the probability of a person walking from one station to another in a given cluster. In order to do this, we nd the distance between each pair of stations by simply plugging the addresses of each station into google maps. Using the walking setting, google maps will show the shortest walking path between the stations in miles which can then be converted to feet. The next step is to gure out the average distance a person is willing to walk. From gure 3(b) in \Incentivizing Users for Balancing Bike Sharing Systems" we see that the average distance, where the cumulative distribution percentage is 50%, is 500 meters(1640 ft) [3]. Now that we have the distances between stations and the average distance a person will walk, we can calculate the standard deviation. Since the standard deviation formula is s = v u u t 1 N 1 N X i=1 (xi x)2 ; where xi is the distances between the stations, x is the average walking distance and N is the amount of stations in the cluster, it can be easily calculated. Lastly, 3 assuming that this follows a normal distribution, we can use the distances, average walking distances, and standard deviation to nd the probability that someone would walk between any two stations. We calculated the probability that users would walk between every station in our 5 target clusters and used this information in order to nd our angel rate. 5 Angel Rates 5.1 Calculations The angel rate is dened as the average probability that a rider will walk from the station they planned to start or nish their ride at to a dierent station for an incentive. To nd the angel rate for each of our clusters, we calculated the mean of the walking probabilities between every unique pair of stations in each cluster as was mentioned in the previous section. Since our clusters had varying station densities, we decided that calculating the angel rate for each one separately was more sensible than lumping all of the probabilities together. 5.2 Applications The angel rate directly inuenced the amount of savings an incentive program would generate for Valley Bike. Our savings function, calculated for each cluster, is as follows: Si = (Ai )(Ri )(C )i 2 Rf1;5g Si ,Ai ,Ri , and C represent savings, angel rate, number of rides, and redistri- bution cost per ride respectively.i represents each cluster in our set. The value produced from this function represents the amount of money saved by redis- tributing our projected number of bikes with the incentive program rather than by truck. It does not account for the cost of providing incentives to the riders. After calculating savings for each cluster, we found the net savings across all clusters. N = 5 X i=1 (Si )(E )(I ) 5 X i=1 Ai 5 X i=1 Ri i 2 Rf1;5g E represents the eectiveness of the static model and N represents net savings [1]. Our function for net savings accounts for the cost of each incentive, and applies that cost to each angel ride that is projected to occur. The eectiveness of the policy (in this case static) is the fraction of bikes that were redistributed to stations that needed rebalancing. If a bike goes to a station that is projected to need rebalancing but does not need rebalancing, the trip that rider took would not reduce the amount of work done by Valley Bike’s drivers. Therefore, the policy’s eectiveness takes away from Valley Bike’s savings. 4 6 Incentives In order to calculate exactly how much money the bike angels program could save, we rst had to calculate how much of an incentive should be provided to angels. There were a number of factors to consider here but we will rst discuss the impact of incentives on our angel rate. Since Valley Bike does not have a bike angels program in place, we had to look elsewhere to nd what percentage of riders we could expect to be angels at a given incentive point. In \Incentivizing Users for Balancing Bike Sharing Systems" we found a chart that approximated the angel rate at dierent incen- tive points. By modeling this chart ourselves, we were able to nd a logarithmic trend line which showed that increasing incentives improved our angel rate at a diminishing rate. This information is based o of a bike sharing system in Zurich, so it may not be entirely accurate for Valley Bike but it should provide a good approximation. From there, we had to gure out our eectiveness rate. As we mentioned earlier, eectiveness is the probability that a given bike angel redistribution will be more cost eective than a traditional redistribution. Again, since Valley Bike does not have a bike angels system in place at this time, we had to look elsewhere for an approximation of this value. In Daniel Freund’s paper \Bike Angels: An Analysis of Citi Bike’s Incentive Program" he discusses the eectiveness rate for various policies at Citi Bike. Since Valley Bike would be starting the Bike Angels program from scratch, we have to analyze only the static model from this paper. During the day, the static model holds up fairly well and remains moderately eective even as we increase our incentive. 5 Unfortunately, the static model does not hold up nearly as well at night. Due to higher variability in optimal placement of bikes, our nighttime eectiveness quickly falls below 50%. Before we move on to calculate our optimal incentive values, it is important to recognize that the day and night time eectiveness values will not necessarily be so dierent for Valley Bike. The paper that we draw from focuses on Citi Bike, which is located in New York City, a much larger and more unpredictable system. This unpredictability makes the static model much less eective than other models because it does not have any way to adapt to unique situations. Since Valley Bike is a much smaller scale system, these unique situations will be more rare and this means that a static model would likely be more eective here. Even still, other models would further improve this eectiveness so they would be worth looking into in the future. Now we simply have to graphically model the trend lines for the graphs that we created above. Doing this will give us an optimal incentive value during the day and night, as well as the expected eectiveness that comes along with it. 6 From here we can clearly see that at night our optimal incentive is about 36 cents, and during the day our optimal incentive is about 57 cents. We then took this information and the average eectiveness rate that corresponds to it in order to calculate an approximation for how much money the bike angels system would save Valley Bike. 7 Results In the end, Valley Bike’s approximate net monthly savings were only $67. While this result is not large, it was roughly what we expected. Since the static policy correctly redistributes bikes 45.95% of the time, over half of the incentives given to riders increased the cost of the incentive program but did not aect savings. Given the approximate cost of redistributing one bike of $1.15, our incentive cost of $0.52 was signicantly lower. However, the eectiveness of the static policy paired with our angel rate of 0.297 closed the gap between incentive cost and redistribution cost. Since Valley Bike’s distribution of stations is signicantly more sparse than the companies our reference papers were written about, breaking even while being dragged down by the eectiveness of the static policy is a promising result. Some variation in our gure for nal savings is likely, as we only used rider data from August 2019 in our analysis. 8 Recommendations From our results you can see that the implementation of a bike angels system will save Valley Bike money. From here, we have a few recommendations on how to get started. First, Valley Bike should implement a static model of bike angels. They must start with a static model because Valley Bike currently has no Bike Angels system in place and the static model serves as a baseline for all 7 other models. Once enough data about Bike Angel habits have been collected, Valley Bike should upgrade to a static hindsight or a dynamic model. A static hindsight model is an oine policy that looks back at previous weeks to choose the optimal continuous incentive period for each station. A dynamic model is an online policy that decides in real time whether or not to incentivize a trip [1]. Valley Bike should upgrade to one of these programs because they are both more eective and ecient at all times of the day than the static model which would end up cutting down on redistribution costs and saving more money. As can be seen in our results, with a static model Valley Bike will save about $67 a month, which since Valley Bike is a non prot company, is not a real issue since they are not losing money. Although, when a better policy like static hindsight or dynamic can be implemented, Valley Bike would be saving closer to two thousand dollars a month. With that money saved, Valley Bike could put more money into incentives which could make more people want to be bike angels. References [1] Hangil Chung, Daniel Freund, and David Shmoys. \Bike Angels: An Anal- ysis of Citi Bike’s Incentive Program". In: June 2018, pp. 1{9.doi:10. 1145/3209811.3209866. [2] Pioneer Valley Planning Commission.Pioneer Valley Regional Bike Share System Pilot. 2016. [3] Adish Singla et al. \Incentivizing Users for Balancing Bike Sharing Sys- tems". In:Proceedings of the Twenty-Ninth AAAI Conference on Articial Intelligence (2015). 8 Valley Bike Rider Based Rebalancing Matthew Pearce mpearce Vishwajith Reddy Anagandula vanagandula Jia Di jiadi 1 Introduction Valley Bike is a bike-sharing system located in the Pioneer valley in Western Massachus- sets. The systems currently consists of over 50 docking stations and over 500 bikes spread across Amherst, Holyoke, Northamp- ton, South Hadley, and Springeld. The sys- tem is widely used with the latest statistics indicating in 2019 there were approximately 75,506 rides taken totaling 197,730 miles for an average riding distance of 2.61 miles per ride. The stated mission of the Valley bike pro- gram is to "promote short bike trips within core communities, where clusters of large em- ployers, colleges, shopping, tourist destina- tions and residents can readily be connected." One major challenge in doing this is ensuring there are always bikes available for rental at stations and also ensuring stations have empty docks to return bikes at. Achieving this balanced state is incredibly dicult and is a problem common to bike shar- ing systems around the world. Some cities have adopted systems that do not rely on docks at all, however, this is better suited for dense urban areas, but is also problematic as it often results in greater disorganization (Pan et al.,2018). Additionally, such a dockless sys- tem is impractical as valley-bike uses eletric bikes that must be recharged. Others cities have oered incentive programs to users to mo- tivate users to dock bikes at in need stations and remove them from overcrowded stations. In this paper we will investigate such strate- gies. 2 Goal We aim to reduce stockout events through rider-based rebalancing in a cost eective man- ner. This will both improve the eciency of the network while reducing the time and num- ber of employees needed to manually rebalance bike docking stations with trucks. 3 Initial Approach We started using "Bike Angels: Citi Bike’s In- centive Program" (Chung et al.,2018) models to make our rst model. Bike Angels used online and oine policies, oine policies was completely dependent on previous data on in- centive rides of Citi bike. The data was di- vided into two 6-hour time periods and both types of policies used this data. The oine policies used were static, static hindsight, clus- ter hindsight and uid models. Static model is a plain simple static model which incentivized the two time periods. The static hindsight model chooses the optimal continuous incen- tive period for each 30 minute interval for ev- ery station for the two time periods. The clus- ter hindsight model works the same way as static hindsight but instead of choosing the op- timal continuous internal period for each sta- tion, it chooses the optimal continuous interval period for a cluster. This is where we got the idea to use clustering in our model for Valley Bike. The online policies had a Dynamic and Dynamic CC models. The Dynamic model chooses in real time if a trip is incentivized or not with perfect eciency while the Dynamic CC model decides for every few minutes, which is xed, to incentivize a trip or not in real time. As Valley Bike didn’t have real time data, we didn’t use the online policies. We started our model by using the clus- ter hindsight model in oine policies as we thought clusters make sense for Valley Bike as the stations are located in each town and there is large gap between stations of each town. We ran into problems with our model with pre- dicting and then noticed that the data Bike Angels used was already data with incentive rides of Citi Bike, we completely scraped our model. After that, we went through many papers and got a model that was suitable for Valley Bike, "Incentivizing Users for Balancing Bike Sharing Systems" (Singla et al.,2015). This paper was from Europe which made a model for bike sharing system in Paris. Unlike Angel Bikes, they used a single model which takes into consideration the budget, user’s cost and uctuating demands over time. This time we didn’t start our model but instead started to parse the model. When we parsed the model, we got to know that this model is completely dependent on the real time user’s location as the model takes into consideration if a user at a location will pick up or return a bike at a dierent location. The model used a key in- dicator functions to know if a user will accept the incentive and also takes into consideration the user’s cost which is how long will the user have to walk from the station of bike return to the previous station. This model is not linear and uses pseudo code which is really compli- cated. As this model used real time data and non linear approach, we didn’t use this model. This paper gave us a lot of useful intuitions and ideas which we used in our model like the matrix approach and the cost function. As the model required using real time data on a rider’s current location, we didn’t use it. 4 Model Then we started making our own model for Valley Bike due to the trac being too low or not able to get the data we wanted to base our model on any of the papers. But, the papers we read gave us a lot of intuition on how to build our model. We took the concept of clustering from "Bike Angels" cluster hind- sight model in oine policies, the user cost and the indicator function of user accepting an in- centive from "Bike Sharing Systems" model. 4.1 Data We use the rental and return data through April, May and August in 2019 from Valley Bike. This time period include the school time and the summer break. Also, through these days, it is more likely to have more rental and returns because of the weather. Among the 52 stations,we made our analysis on a total rides of 23755 in 93 days. We assign each station a serial number for easier process the data. The current number of bikes at each station is random, between 0 and the maximum capacity of that station. The rebalancing requests are random numbers between -3 and +3 (+ when more bikes are needed,when remove of bikes from that station is needed). The rebalancing request data is the output of another group. For accurate rebalancing, their result should be the input here. We made a 3-D array of time interval, start station and end station. In this project time interval is set as one hour. For the data and variables we used in this project, we use the following notation: Ws;t : Ideal rebalancing for each station. Ks;t : Number of bikes changed for station over time interval t. Qs;t : Amount of bikes we want to rebalance after expected changes. Qs;t =Ws;t Ks;t Qabss;t =jQs;t j IC : Minimum incentive to have people go somewhere in radius which is $1.55. P : The acceptance of user incentive, we use 0.25 in this project. The ideal rebalancing is considered as all the stations perfect rebalanced. Which should be the output of Project 1. Number of bikes changed is the dierence on number of bikes of one station during one time interval. For the acceptance rate of the incentives, an estimate is made. 0.25 is used in this project. This statistic along with the $1.55 incentive oer were based on the surveys and network-simulations performed in (Singla et al.,2015) 4.2 Variable mvi;j : A matrix of trips to incentivize. IVi;j : A matrix showing the number of incentives to oer based on acceptance prob- ability. b : Dierence of how many bikes to re- balance which dened as 0 when the bikes are fully rebalanced, and P Qabss;t ideal represents 0 % of rebalance. Mrb: max mVi;j cost: The cost for x percentage of bikes to be rebalanced at time t. Ideal: Represents the net change we want to achieve from rebalancing in a cluster: sum(Qs;t ) across all s for any given t Inni : Total incentivised trips into station i. Outni : Total incentivised trips out of station i. Insout: Total incentives at station i. Before start processing the data, we sepa- rate the stations into 4 clusters. Using the data from Valley Bike, a plot was made for all the stations. The stations forms 4-5 clus- ters. This can be seen in the following plots illustrating the stations that would be associ- ated with each other when there are 3,4, and 5 clusters. Station Plot 3 Clusters 4 Clusters 5 Clusters With the route data of the users, the num- ber of trips inside/between the clusters is determined. 4 is the ideal number of clusters since it is the only amount of clusters that form a natural grouping and have over 90% of rides starting in any given cluster ending in that same cluster. The use of clustering method helps simplifying the optimization. The incentives provided after using clustering method is more precisely. The users who get the incentives are more likely to accept the incentive than with all the stations involved. This is because instead of seeing the incentive oers for up to 50 other stations, they will only see the oers for the stations that they are likely to traveling around. 4.3 Program We are running our model for each cluster in- dependently for each time interval and we are minimizing the total number of bikes to move from station i to station j ,mvi;j : X i X j mvi;j For our model to be feasible, we have to make sure that the trips don’t go from and to the same station, that is from station i to station i. This can be achieved by making the number of trips to incentivize or the matrix of trips to incentivize zero where the bikes go from station i to station i. We must also make sure that we update the Cost variable updates to a new cost after a bike is moved from station i to station j . We should also update the matrix of trips mvi;j after a bike has moved from station i to station j by updating the total incen- tivized trips ’in’ to station i,Ini and ’out’ of station i,Outi . This only is updates for station i after removing a bike from station i but we are also adding a bike to station j . For station j repeat the updating for station j . As the number of incentives to oer,IVi;j is still set to the same value, we must update it by multiplying the inverse of xed acceptance probability of incentives p to number of trips to incentivize, which is the matrix of trips to incentivize,mvi;j . Finally, we shouldn’t over incentivize and also make sure that the bikes should stay in the same cluster. For bikes to be in the same cluster we make the number of trips to incentivize zero for stations not in the cluster C and making the total incentives at stations outside the cluster C . To make sure we don’t over incentivize, the total incentivized trips in to Ini , out to Outi and total incentives should be less than the absolute value of Qs , that is Qabss . As we are running our program for each time interval, we ignored the ’t’ for our variables or data. Our program: Minimize X i X j mvi;j Subject to P i P j mvi;j = (( P S Qabs j idealj)=2)b mvi;i = 0 8i 2 S |{z } Trips don’t go to same station Mrb =max(mvi;j )8i;j 2 C Cost =IC X i X j mvi;j |{z } Updating Cost Ini = X j mvj;i 8i 2 S |{z } Updating In Matrix Outi = X j mvi;j 8i 2 S |{z } Updating Out Matrix 1 p mvi;j 8i;j 2 S |{z } Updating number of incentives Ini Qabsi 8i 2 S Outi Qabsi 8i 2 S InsOuti Qabsi 8i 2 S mvi;j = 0 8i;j =2 C InsOuti = 0 8i =2 C Using this program we are nding the num- ber of trips to incentivize mvi;j , which is in the form of a matrix, number of incentives to oer based on acceptance probability IVi;j , which is also in the form of a matrix, the dierence of how many to rebalance with a full rebalance b and how much it costs to rebalance Cost. We improved our model by introduced a sec- ond objective function into our program which ensures a more even incentive oering by min- imizing the maximum values of the matrix of trips to incentivize Mrb: Minimize Mrb We use the same constraints as the primary objective function. This objective function has a second priority that means it only optimizes the program in ways that do not reduce the optimality of the primary objective function. We added this secondary objective function to balance out the number of incentives we are trying to oer at any given station. Since there the network has relatively little trac, we did not want to try to incentive 4 rides at one sta- tion when we could incentive 1 ride at 4 dif- ferent stations. From our analysis of the ride data, we believe the incentives are far more likely to be accepted in the second scenario. 5 Program Use Instructions 1.Clone / Download the code from the GitHub Repository:https://github. com/mpearce25/valley_bike 2.Install conda to setup the same python environment. If you are only trying to run the optimizer and not preprocess any data or do any of the visualizations, it may be enough to have gurobipy and basic python libraries such as numpy and pandas in- stalled. 3.Import the conda python environ- ment.To do this make sure you are in the proper directory and then run the fol- lowing command (all on one line): conda env create -f conda_req.yml python=3.7 This will by default create a conda envi- roment with the name m458. 4.Activate environment Run the com- mand: "conda activate m458". This must be run in every shell before running the actual program. 5.Run the program. The Main.py le contains all of the code necessary to run the program. Lots of the code is cur- rentely commented out, however, if you wish to re-randomize some of the pro- grams parameters you can selectively un- comment the apprioriate lines. The current conguration of the le will simulate the rebalancing program running at noon. The program can also be run for multiple time periods at once. All results get saved in the rebalance results folder. This folder currentely has results after running the optimization for each of the 24 time intervals. If you run this multiple times, it will overwrite the le for the ap- prioriate time interval (zero-indexed) but it is probably a good idea to delete all of the items in the folder (but not the folder itself) before running the program. The program generally runs for all clus- ters in under 30 seconds. 6 Results The following results utilized many random parameters to account for certain unknown factors such as the current state of the network and the ideal rebalancing amounts at each sta- tion. These randomized parameters can eas- ily be tweaked, however, the results below are based on following: Attempting to rebalance between -3 and 3 bikes per station Each station initialized with between 0 and a full station worth of bikes Running at time interval t = 11 =) noon Users accepting an incentive with a prob- ability of p =:25 Each accepted incentive costing $1.5 (Slightly dierent from cost of $1.55 given in model, but was adjusted to show pro- gram’s exibility) When the program is run, for each of the four clusters a matrix will be produced represent- ing the number of rides between stations that will ideally be incentivized, the number of in- centive prompts that should be presented to users to successfully incentives that behavior, what percent of intra-cluster rebalancing was achieved, and the cost to achieve that percent- age of rebalancing. This screenshot from the cost 11.txt le shows an example of a potential output for cluster 0. The ’zoomed ride rebalancing’ is the number of rides we want to incentivze from station Si to station Sj . The values prexed with ’S’ on the far right indicate what station that row / column represents. Since it is a sqaure ma- trix, the I th row and I th column correspond with the same station. Using the rst matrix in the image above as an example, we can see that 1 ride should be incentivized from station s1 to station s6 (as indicated by the rst 1 on the far left column). The station numbers can be correlated with physical locations or names using the "station locations mod.csv" le. The zoomed incentive oers is the same as the zoomed ride rebalancing, but scaled by 1 probability of accepting incentive oer . The phrasing "zoomed..." was used since for each cluster we are still utilizing the full 52 x 52 station incen- tive matrix to simplify the optimization pro- gram, but wen viewing it we only want to view a the stations in that cluster which requires ’zooming’ in on that portion of the larger ma- trix. We can see in the results that to achieve 100% rebalancing within cluster 0 at noon it will cost $7.5, but 60% rebalancing only costs $4.5 Condensing the information from the out- put le we see these results. Cluster 0 Rebalanced (%)Cost ($)) 100 7.5 60 4.5 20 1.5 0 0 Cluster 1 Rebalanced (%)Cost ($)) 100 4.5 66.66 3 33.333 1.5 0 0 Cluster 2 Rebalanced (%)Cost ($)) 100 15 60 9 20 3 0 0 Cluster 3 Rebalanced (%)Cost ($)) 100 12 50 6 0 0 Total Cost:$39.00 These results are very promising as they suggest that 100% intracluster rebalancing can be achieved across all clusters at noon for just $39. The cost to achieve rider based rebal- ancing may also decrease over time as if this program is running continously the amount of rebalancing needed at any given time would likely be minimal. Our testing shows our model has a clear positive correlation between the amount of rebalancing needed and the cost to rebalance the network. It is important to recognize that this pro- gram can be used very exibly. For example if upon running it and examining the results the $15 needed to fully rebalance cluster 2 seemed to high, it is possible to instead spend just $9 and achieve 60% rebalancing in that cluster while still fully rebalancing the other clusters. Additionally, it may be helpful to use the ra- tio of cost rebalance %as a hueristic to decide what percent of rebalancing to do in each station. The lower the ratio, the more cost eective the rebalancing is. This is not a useful measure at the moment since there is a constant incentive cost function, however, future data collection on user incentive respond could make this fea- sible. 7 Future Work We believe that there are more eective meth- ods to rebalance valley bike than our model. As there is not much data and the trac is very low we assumed many things. The ’An- gel Bikes’ models and ’Bike Sharing Systems’ models are much more ecient. These can be implemented if Valley Bike runs our model for six months to an year and record for each day of a week how the trac and bikes are with incentives in the morning and in the night. With this data, the ’Bike An- gels’ oine policies can be used. If valley bike uses real time tracking of a bike and records all the data, then ’Bike Angels’ online poli- cies and ’Bike Sharing Systems’ model can be used. We think having a strategy that follows this implementation would be highly eective. When collecting this data further testing can also be done to determine what kind of incentives user’s best respond to. Previous programs have used social standings (Chung et al.,2018), cash-incentives, and discounts on future rides. 8 Conclusion Although we were unable to utilize any of the models used in larger bike-sharing systems, we were still able to produce a very good baseline model to achieve user-based rebalancing. Our tests on the randomized data indicates user- based rebalancing is both possible and eco- nomical. The largest challenge in using this model would likely be creating a user inter- face oers users incentives based on our models suggestions. This would require a method of providing the networks current bike allocation status to our model along with a redesigned mobile application that is able to notify users of relevant incentives within their cluster. Despite these challenges, we believe imple- menting our model could reduce, but not elim- inate, the amount of labor dedicated to truck- based rebalancing allowing for more focus on maintaining bikes components and cleanliness. It is important to recognize that our solu- tion reduces the need for but does not replace truck-based rebalancing. There are situations where users may not want to accept incentives if there is inclement weather or people are sim- ply in a rush. These situations and rides that escape a single cluster mean our model should be implemented alongside truck-based rebal- ancing. We hope that implementing our model makes the valley-bike share more ecient and improves the rider experience. 9 Code Links Github:https://github.com/ mpearce25/valley_bike Conda Installation:https://docs. conda.io/projects/conda/en/latest/ user-guide/install/ References Chung, H., Freund, D., and Shmoys, D. (2018). Bike angels: An analysis of citi bike’s incentive program. pages 1{9. Pan, L., Cai, Q., Fang, Z., Tang, P., and Huang, L. (2018). Rebalancing dockless bike sharing sys- tems.CoRR, abs/1802.04592. Singla, A., Santoni, M., Bartok, G., Mukerji, P., Meenen, M., and Krause, A. (2015). Incentiviz- ing users for balancing bike sharing systems. In Proceedings of the Twenty-Ninth AAAI Confer- ence on Articial Intelligence , AAAI’15, pages 723{729. AAAI Press. Optimizing Dock Allocation in Valley Bike Share Ben Goslin, Junting Hu, Nick McCarthy, Qianrui Wen December 16, 2019 Abstract The goal of this project is to minimize the amount of out of stock events in a bike sharing system by reallocating docks to dierent sta- tions. This was rst approached as a linear program and then com- pleted with a gradient descent algorithm. 1 Introduction For this project, we were fortunate enough to be able to work with data from a real local bike share service. ValleyBike Share is a bike share company servicing the Pioneer Valley of Massachusetts. This includes the city of Springeld, Holyoke, Northampton, Hadley, and UMass as well as a few others. ValleyBike Share is a system consisting of 50 stations and roughly 450 bikes. The system is smaller than most urban bike share services such as New York’s Citi Bike or Boston’s Bluebikes, yet it still faces some of the same issues. The idea of a bike share system creates a type of asymmetric demand that must be addressed to allow the system to run smoothly. Specically, it must avoid the situation in which a customer arrives to rent and there are no bikes to rent or the situation where a customer comes to return and there are no empty docks. Such an event is called an Out of Stock Event, or, as we refer to them in this project, an OSE. Working with the information from the paper,Minimizing Multimodular Functions and Allocating Dock Capacity in Bike-Sharing Systems by Daniel Freund, Shane G. Henderson, and David B. Shmoys, we attempted to address this issue. 1 Therefore, our goal became minimizing out of stock events. To accomplish this, we attempt to relocate docks between stations. 1.1 Visualizing the Problem In the following graphics, the red dots indicate the maximum number of bikes at one hour, the blue dots indicate the minimum number of bikes at one hour, and the green line is the number of docks at the station. Figure 1: Here we see an example of a station with not enough docks. The maximum number of bikes almost always reaches the capacity of docks cre- ating a number of out of stock events. 2 Figure 2: Here we see an example of a station with too many docks. The maximum number of bikes almost never reaches the capacity of docks. 3 Figure 3: Finally, this graph shows what would happen if some of the docks from the station with too many were moved to the station with too few. As we can see, the maximum never reaches the capacity of the station and the minimum stays above zero. 4 2 Model Here we explain the model that we follow, including the variables and con- stants, objective, and constraints. While we were unable to use the objective function in our nal analysis, the rest of the model was helpful in setting up a successful algorithm for nding a solution to our problem. 2.1 Variables and Constants bi : is a variable indicating the number of occupied docks at station i at the start of the day di : is a variable indicating the number of empty docks at station i at the start of the day di : is a variable indicating the number of empty docks at station i currently bi : is a variable indicating the number of occupied docks at station i currently z ? i : is a linear representation of the previous constraint which included abso- lute value zi : represents the change in dock allocation at station i yi : is a binary variable that represents if zi is positive or negative ui : is a constant value for the upper bound of docks for each station li : is a constant value for the lower bound of docks for each station z : is a constant value for the operational constraint which is how many docks can be moved in one day B : is a constant value for the number of bikes in the system D: is a constant value for the number of empty docks in the system bi : can take any integer value between the upper limit and lower limit of docks at station i di : can take any integer value between the upper limit and lower limit of docks at station i di : can take any integer value between the upper limit and lower limit of docks at station i bi : can take any integer value between the upper limit and lower limit of docks at station i z ? i : can take any integer value between 0 and n where n is the number of docks moved at that station and can be represented as z ? i =j (di +bi )(di + bi )j 5 zi : can take any integer value between -n and n where n is the number of docks moved at that station and can be represented as zi =( + )-(di +bi ) yi : can take the value 0 or 1 (binary variable) to represent whether zi is positive or negative. 0 represents negative and 1 represent positive ui : is set at an integer value li : is set at an integer value z : is set at an integer value B : is set at an integer value D: is set at an integer value 2.2 Objective and Constraints Our objective function, listed below, essentially consists of a way to count the out of stock events experienced by a customer at time . It does this by checking the available number of bikes and docks at time 1, given by the variables and , and if that number is not sucient to fulll the request it will increase the number of out of stock events by one. cX (d;b) =:X = 1;X (1)(d;b) = 0 +:X =1;X (1)(d;b) = 0 Our constraints are explained below: The sum of full and empty docks over every station cannot exceed the total number of full and empty docks in the system. P i bi +di D +B The sum of full docks over every station can not exceed the total number of bikes in the system. P i bi B The number of docks moved between stations can not exceed twice the num- ber of docks that can be moved. P i j (di +bi )(di +bi )j 2z 6 The number of full and empty docks at each station must stay between the maximum and minimum number of docks allotted for each station. li bi +di ui 3 Analysis In order to solve our problem, we were forced to employ a gradient descent algorithm. This is covered in more detail later, however the foundation is covered rst. We began by looking at one station and what possible situations could arise given a number of customers S . It is easy to see that a single customer could either return a bike, notated by a +1, or rent a bike notated by a 1. We then generate dierent scenarios based on these possible situations. For example, it is possible that all S customers come and attempt to rent a bike which we would denote as: 1 1 1:::1 |{z } S customers It also possible that all S customers could attempt to return a bike which we would denote: +1 + 1 + 1:::+ 1 |{z } S customers Including these two cases, there are in fact 2S scenarios for how S cus- tomers could interact with a single station. What we then did, was create an array for each station. Here, our S was the average number of customers per day which we gleaned from the data. 7 2S scenarios 8 > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > : +1 + 1 + 1:::+ 1 : : : +1 1 + 1:::+ 1 : : : 1 1 1:::1 |{z } S customers We next looked at each scenario in the array and calculated how many out of stock events would occur if that particular scenario took place. We did this for two initial conditions, having all docks in the station empty and having half of the station’s docks empty. We then calculated an expected value for these out of stock events. To accomplish this we assigned a poisson likelihood to each scenario in the array. This was calculated with the following formula: k e k ! Here,is the either the average number of rentals or the average number of returns to a station per day based on which is used for the computation. Considering the results should be the same regardless, we chose to focus on rentals. This value was found to be roughly half of the average number of customers S . Also in the formula,k is the actual number of rentals or returns that occurred for that scenario, again we are choosing to focus on rentals. We then took this likelihood and multiplied by the number of out of stock events for each scenario and summed the value for each scenario at a station to get a stations expected value of out of stock events. Summing this value for each station gave us the expected value for the entire system. Our algorithm then attempted to minimize this value by moving docks between stations. 4 Gradient Descent In our project, we already found the objective function and set up the constraints. However, after we carefully analyzed the object function, we 8 realized that the object function was actually binary. This gave us a problem, because we could not solve the model as a linear system. Thus, we need to introduce another new algorithm, the Gradient-Descent method. In denition, Gradient-Descent is an iterative optimization algorithm for nding the minimum of a multi-variable function. The function for Gradient- Descent should be dierentiable and continuous, because we want to follow in the direction of the largest negative gradient of the function at a certain point until it reaches a minimum, then we keep doing this to nd an optimal solution. We apply Gradient-Descent in our model in order to minimize the out-of-stock event and to minimize the user dissatisfaction function. 5 Algorithm and Code In Freund et al., they proved their objective function can be solved with gradient descent. For our simulation, we follow the gradient descent algo- rithm by performing bike and dock moves that result in the greatest negative change of the objective function. In Freund et al, they start by removing all bikes from the system and add them in one by one, also known as the greedy algorithm. Once the bikes have been added, they proceed with bike and dock movements. For this project, docks were moved from station i to station j and bikes were moved from station i to station j after an initial adding of bikes to the system. In the code, the expected value is calculated by counting the number of out of stock events for each possible combination of rentals and returns for the current allocation of bikes and docks. For each station, there is 2 to the power of s customers number of combinations. In order to avoid memory issues with the code, if s is greater than 10 for a station, that station has its expected value of out of stock events calculated using 10 for the number of customers and lambda as 5 for the poisson distribution. The number of out of stock events for a particular combination of rentals and returns is then multiplied by the probability of that combination occurring, which is calculated from a Poisson distribution. The expected value of out of stock events for each combination is summed to give the expected value of out of stock events for the station. Starting at the rst station, the expected value of out of stock events is calculated. The value of removing a dock from this station is also calculated. The dierence between these two values is calculated and saved. This dier- 9 ence is the change in the expected value of out of stock events for a station when a dock is removed. This dierence is the initial out of stock expected value minus the expected value with the dock removed. If this value is 0, then there is no dierence in the expected value. If the dierence is negative, then the initial value is less than the value of removing the dock. If the dierence is positive, then the initial value is greater than the value when removing a dock. For the station that is receiving this dock, the initial expected value was calculated and saved. The expected value of adding a dock is also calcu- lated and saved. The dierence between these two values is calculated and has the same implications as the dock removal dierence. Ideally, a move that results in both dierences being positive is taken. That is the removal of a dock results in that station having a lower expected value of out of stock events while simultaneously having the station where the dock is added also have a lower expected value of out of stock events. In the event either dif- ference is positive, if the dierence that positive is less than the dierence that is negative, then the move is allowed. In other words, if removing or adding a dock to a station increases the expected value of that station and the eect of removing or adding a dock to the other station results in a de- crease of the expected value for this station that has a larger magnitude than the magnitude of the other station’s increase, then that move is considered. Moves that make the expected value of both stations increase are not taken. In the code, each dierence for every combination of stations is calculated and compared. The rst combination of stations (station 1 and 2) and the dierence of the expected value of out of stock events from moving a dock from station 1 to 2 for are recorded. The calculations for the next combination of docks are performed. If the overall dierence of moving a dock between these two stations results in a greater decrease of the overall expected value of out of stock events, then these two stations and the decrease are recorded. These calculations are performed for every combination of station i and station j where i != j (i does not equal j) and the move that results in the greatest neg- ative change of the overall expected value of out of stock events is performed for every i until the overall expected value does not decrease from dock moves anymore. Similarly, the bike moves are calculated and performed. Instead of comparing the expected values when a dock is moved from station i to j, the expected values are calculated when moving a bike from station i to j. For the initial adding of bikes, the expected value of out of stock events is calculated and recorded. A bike is added to station i if adding a bike to that 10 station results in the greatest negative change in the overall expected value, or the least positive change in the expected value of out of stock events for every bike. For this project, 500 bikes were considered and the initial alloca- tion of docks is the current allocation of docks that ValleyBike is currently using. Following the greedy algorithm, the objective function value was 12.178 with the allocation of bikes and docks shown in appendix A1. These results were quite strange, and most likely came about from the code being ine- cient or not handling certain cases properly. We also tried another method, the half full allocation. In the half full allocation, instead of removing all bikes from the system, we put each station at half capacity before adding the remaining bikes to the system. After the bikes were added, dock and bike moves were performed. The associated objective function value was 11.839. This method yielded a dierent allocation of bikes and docks with a lower objective function value as shown in appendix A2. However, this allocation can be furthered improved. For example, at station 43, the associated ex- pected value of out of stock events would be 0.0 if there are 6 docks and 3 bikes allocated there. The allocation provided by the code is 15 docks and 15 bikes with an associated value of 0.0 which is an incorrect expected value for out of stock events with this allocation of docks and bikes. Again, this error is most likely due to some minor errors or ineciencies in the code. This code could be improved by rst xing any edge cases and general issues with the code. This code used arrays to carry out calculations. In Freund et al, heaps were used, which results in a faster run time than this code. Using heaps could improve memory usage, allowing the program use the actual number of average customers or a larger number than 10 when calculating the expected value of out of stock events for a station. Another improvement would be to carry out the allocation of bikes and docks with dierent starting allocations. This would closer mimic gradient descent which typically goes through multiple runs of the algorithm with dierent initial conditions such as the starting point. 6 Conclusion In general, we focused our project on allowing ValleyBike Share system to run smoothly. Specically, we focused on minimizing the out of stock events. We built a model for our program, including variables, constraints and the 11 objective function which counts the out of stock events by a customer and solves it by checking the available numbers of bikes and docks. However, we gured out that the objective function is a binary function which we can not solve with the linear program. After analyzing the function and using the poisson formula, we calculated the expected value of out of stock events and decided to use the gradient descent, which is an optimal algorithm for minimizing the multi-variables functions. Our simulation are moving empty docks between stations that result in lower expected value and returning allocation of docs and bikes when the lowest expected value is achieved. According to the results of coding, we have the expected out of stock value of each station. In this case, we have some space to improve. Running the program with more memory to get a more robust solution would help, as the array gets too big to compute with actual customer numbers. We believe that after optimizing our program, the results will be much more accurate and the out of stock events are going to minimize in the future. 7 Appendix A1 Allocation of Docks and Bikes with Greedy Algorithm Station 1 expected OSE = 0.19096814788042776 bikes: 15 Total Docks: 16 Station 2 expected OSE = 0.36755764917019657 bikes: 0 Total Docks: 9 Station 3 expected OSE = 0.4655150490379317 bikes: 0 Total Docks: 11 Station 4 expected OSE = 0.6957068725289121 bikes: 1 Total Docks: 14 Station 5 expected OSE = 0.19098490253647885 bikes: 11 Total Docks: 12 Station 6 expected OSE = 0.19139006916287118 bikes: 0 Total Docks: 9 Station 7 expected OSE = 0.2543905131725438 bikes: 0 Total Docks: 14 Station 8 expected OSE = 0.30590542692720946 bikes: 1 Total Docks: 19 Station 9 expected OSE = 0.19097580470631428 bikes: 12 Total Docks: 13 Station 10 expected OSE = 0.19097580470631428 bikes: 12 Total Docks: 13 Station 11 expected OSE = 0.19097580470631428 bikes: 12 Total Docks: 13 Station 12 expected OSE = 0.49234267582643265 bikes: 3 Total Docks: 14 Station 13 expected OSE = 0.1915185454451452 bikes: 0 Total Docks: 8 Station 14 expected OSE = 0.19121883237120108 bikes: 0 Total Docks: 10 Station 15 expected OSE = 0.19097580470631428 bikes: 12 Total Docks: 13 Station 16 expected OSE = 0.19086887186600593 bikes: 17 Total Docks: 18 Station 17 expected OSE = 0.19096814788042776 bikes: 15 Total Docks: 16 12 Station 18 expected OSE = 0.19096814788042776 bikes: 15 Total Docks: 16 Station 19 expected OSE = 0.19096814788042776 bikes: 15 Total Docks: 16 Station 20 expected OSE = 0.19096814788042776 bikes: 15 Total Docks: 16 Station 21 expected OSE = 0.1909758047063048 bikes: 13 Total Docks: 13 Station 22 expected OSE = 0.19096814788042776 bikes: 15 Total Docks: 16 Station 23 expected OSE = 0.190970844275062 bikes: 14 Total Docks: 15 Station 24 expected OSE = 0.19151854544514463 bikes: 3 Total Docks: 8 Station 25 expected OSE = 0.1909227772396496 bikes: 16 Total Docks: 17 Station 26 expected OSE = 0.190970844275062 bikes: 14 Total Docks: 15 Station 27 expected OSE = 0.191390069162871 bikes: 1 Total Docks: 9 Station 28 expected OSE = 0.5273055159218172 bikes: 4 Total Docks: 15 Station 29 expected OSE = 0.19096814788042776 bikes: 15 Total Docks: 16 Station 30 expected OSE = 0.1909758047063048 bikes: 13 Total Docks: 13 Station 31 expected OSE = 0.1909227772396496 bikes: 16 Total Docks: 17 Station 32 expected OSE = 0.6482504365868822 bikes: 2 Total Docks: 11 Station 33 expected OSE = 0.190970844275062 bikes: 14 Total Docks: 15 Station 34 expected OSE = 0.19103899137093389 bikes: 11 Total Docks: 11 Station 35 expected OSE = 0.1909227772396401 bikes: 17 Total Docks: 17 Station 36 expected OSE = 0.19096814788042776 bikes: 15 Total Docks: 16 Station 37 expected OSE = 0.1908058920897186 bikes: 19 Total Docks: 20 Station 38 expected OSE = 0.1909708442750525 bikes: 15 Total Docks: 15 Station 39 expected OSE = 0.2556726925345866 bikes: 2 Total Docks: 3 Station 40 expected OSE = 0.19139006916287118 bikes: 0 Total Docks: 9 Station 41 expected OSE = 0.37410152826112486 bikes: 1 Total Docks: 12 Station 42 expected OSE = 0.19121883237119955 bikes: 8 Total Docks: 10 Station 43 expected OSE = 0.3411887750673207 bikes: 1 Total Docks: 16 Station 44 expected OSE = 0.19096814788040875 bikes: 17 Total Docks: 16 Station 45 expected OSE = 0.19096814788041824 bikes: 16 Total Docks: 16 Station 46 expected OSE = 0.19086887186598692 bikes: 18 Total Docks: 19 Station 47 expected OSE = 0.1908058920897186 bikes: 19 Total Docks: 20 Station 48 expected OSE = 0.19097580470629527 bikes: 13 Total Docks: 14 Station 49 expected OSE = 0.1909758047063048 bikes: 13 Total Docks: 13 Station 50 expected OSE = 0.1909227772396401 bikes: 17 Total Docks: 17 Total cost: 12.17801792248864 13 8 Appendix A2 Allocation of Docks and Bikes with Initial Half Capacity Allocation 1 expected OSE = 0.19096814788041824 bikes: 16 Total Docks: 16 Station 2 expected OSE = 0.3675576491424709 bikes: 4 Total Docks: 9 Station 3 expected OSE = 0.46533087789636535 bikes: 5 Total Docks: 12 Station 4 expected OSE = 0.7002358889992197 bikes: 5 Total Docks: 13 Station 5 expected OSE = 0.19103899137094435 bikes: 5 Total Docks: 11 Station 6 expected OSE = 0.19121883237120033 bikes: 4 Total Docks: 10 Station 7 expected OSE = 0.25439051317233347 bikes: 6 Total Docks: 14 Station 8 expected OSE = 0.3059054269245054 bikes: 10 Total Docks: 19 Station 9 expected OSE = 0.19098490253648912 bikes: 6 Total Docks: 12 Station 10 expected OSE = 0.19098490253648892 bikes: 7 Total Docks: 12 Station 11 expected OSE = 0.19098490253648892 bikes: 7 Total Docks: 12 Station 12 expected OSE = 0.49148321597179323 bikes: 8 Total Docks: 17 Station 13 expected OSE = 0.19151854544514463 bikes: 3 Total Docks: 8 Station 14 expected OSE = 0.19121883237120033 bikes: 4 Total Docks: 10 Station 15 expected OSE = 0.19097580470633388 bikes: 7 Total Docks: 13 Station 16 expected OSE = 0.1908688718659964 bikes: 18 Total Docks: 18 Station 17 expected OSE = 0.19096814788040875 bikes: 16 Total Docks: 16 Station 18 expected OSE = 0.19096814788040875 bikes: 16 Total Docks: 16 Station 19 expected OSE = 0.19096814788040875 bikes: 16 Total Docks: 16 Station 20 expected OSE = 0.19096814788038974 bikes: 16 Total Docks: 16 Station 21 expected OSE = 0.19097580470633407 bikes: 6 Total Docks: 13 Station 22 expected OSE = 0.19096814788040875 bikes: 16 Total Docks: 17 Station 23 expected OSE = 0.19097353950609258 bikes: 7 Total Docks: 14 Station 24 expected OSE = 0.1913900691628702 bikes: 5 Total Docks: 9 Station 25 expected OSE = 0.1909227772396401 bikes: 17 Total Docks: 17 Station 26 expected OSE = 0.19097353950609258 bikes: 7 Total Docks: 14 Station 27 expected OSE = 0.19121883237120033 bikes: 4 Total Docks: 10 Station 28 expected OSE = 0.5273052850374325 bikes: 6 Total Docks: 15 Station 29 expected OSE = 0.19096814788040875 bikes: 16 Total Docks: 17 Station 30 expected OSE = 0.19098490253648912 bikes: 6 Total Docks: 12 Station 31 expected OSE = 0.19092277723963058 bikes: 17 Total Docks: 18 Station 32 expected OSE = 0.6477154827449071 bikes: 5 Total Docks: 12 Station 33 expected OSE = 0.1909735395060924 bikes: 8 Total Docks: 14 Station 34 expected OSE = 0.19103899137094435 bikes: 5 Total Docks: 11 14 Station 35 expected OSE = 0.19092277723963058 bikes: 17 Total Docks: 18 Station 36 expected OSE = 0.19096814788040875 bikes: 16 Total Docks: 17 Station 37 expected OSE = 0.19080589208970908 bikes: 20 Total Docks: 20 Station 38 expected OSE = 0.1909735395060924 bikes: 8 Total Docks: 14 Station 39 expected OSE = 0.25522902513856627 bikes: 2 Total Docks: 5 Station 40 expected OSE = 0.1913900691628704 bikes: 4 Total Docks: 9 Station 41 expected OSE = 0.3741015282607502 bikes: 5 Total Docks: 12 Station 42 expected OSE = 0.19103899137094435 bikes: 5 Total Docks: 11 Station 43 expected OSE = 0.0 bikes: 15 Total Docks: 15 Station 44 expected OSE = 0.19096814788039923 bikes: 16 Total Docks: 18 Station 45 expected OSE = 0.19096814788044678 bikes: 13 Total Docks: 16 Station 46 expected OSE = 0.19086887186598692 bikes: 18 Total Docks: 19 Station 47 expected OSE = 0.19080589208970908 bikes: 20 Total Docks: 20 Station 48 expected OSE = 0.1909758047063335 bikes: 9 Total Docks: 13 Station 49 expected OSE = 0.19097580470633368 bikes: 8 Total Docks: 13 Station 50 expected OSE = 0.1909227772396401 bikes: 17 Total Docks: 17 Total cost: 11.838786150955382 15 Optimization of Maintenance and Cleaning of a Clustered Bike Share System Adam Viola and Bertram Thomas 1 Abstract Bike share system are a promising form of transportation in both urban and rural settings. They face many problems, one of which is how eciently bikes are maintained and cleaned. In this paper, we propose a model which clusters nearby stations, weighs each cluster based on the time since last maintenance or cleaning of each bike in the cluster, and constructs a route through clusters in a time-ecient manner which maximizes the total score and ts within a maximum time constraint. 2 Introduction The motivation behind this paper is to assist the cleaning and main- tenance eorts of ValleyBike. ValleyBike is a bike share system made up of a set of electric bikes and charging stations which hold them. Anyone with an account is able to rent an electric bicycle from a station, ride the bike for some maximum time, and return the bike at any charging station. Bike share companies are becoming more popular in large cities because they are environmentally friendly and ecient alternatives to the other types of city transportation and have the potential to solve the last-mile problem. Numerous papers have been written about these systems on topics such as rebalancing, prot margins, station placement, and station size. One topic that has not received as much attention as the others is the issue of clean- ing and maintaining a bike eet eciently. This is the main problem that motivates the work we have done in this paper. Our paper focuses on the routing logistics of performing proactive main- tenance on every bike. In particular, we provide a solution to the problem of nding an ecient route between stations to ensure bikes are regularly maintained given a time constraint. Our model and ideas can be extended to cleaning bikes regularly as well. In our situation, the bike share company is required to proactively maintain every bike at least once per month and clean 1 every bike once per week. Currently, no proactive maintenance is performed, but the company does visit every station and clean every bike every day. This is redundant, as nearly every bike is maintained every day, rather than once per week. Another unique factor of our system is the distribution of the sta- tions. In many bike shares, the stations are approximately evenly distributed throughout the location. In our system, since Western Massachusetts is very rural area with long roads connecting the centers of towns, the stations are distributed in distinct clusters around each town center. We are able to use this fact heavily to our advantage to signicantly reduce the computational power necessary for our model. We use these clusters to create a time-ecient routes that visit the bikes that have not been maintained recently. 3 Problems we’ve run into Our initial approach to creating an ecient route between stations to maintain as many bikes as possible was to treat the the routing problem as a prize collecting traveling salesman problem with a budget constraint. The bike stations and the distances between them respectively represent the nodes and weighted edges of a complete graph, where the prize for visiting each station is computed using each bike’s time since last maintenance or cleaning. We designed a linear program to traverse as many stations as possible in a single loop, which would serve as the best path for maintainers to take. To ensure no cycles appeared within our path, we needed constraints to examine every subset of the graph, which enforced that if a station was visited both inside and outside the subset, then the path must cross from outside to inside the subset at some point. This method produced optimal paths between stations, but we realized that with upwards of 50 stations, enforcing weak connectivity is this fashion requires 250 constraints. Given the clustered nature of ValleyBike’s stations, we decided to break the problem up into 5 small clusters which contains approximately 10 stations each. This reduced the number of constraints of each program to only 25 constraints. By merging stations into a single cluster, we lose information about the problem as a whole in exchange for computational feasibility. Suggesting which clusters to visit is far less helpful than suggesting which exact stations to visit. In order to address this issue, we decided to create a separate intra- cluster program to run on each cluster, and the larger, intercluster program provides the path between these clusters. This allowed us to provide a route 2 between all of the stations using signicantly fewer constraints. 4 Approach Our approach to the maintenance problem focuses on maximizing the total number of maintained bikes on a time-restricted trip while prioritizing \older" bikes, which have a time longer since last maintenance. We achieve this by rst grouping multiple nearby stations into local clus- ters. We apply k-means clustering with k-means++ initialization to separate the stations into clusters. We provide all necessary constant as well as three user-dened parameters to a controller, which operates the intracluster and intercluster programs to produce an ecient route for maintaining bikes. 4.1 Intracluster Program The intracluster program is an integer program which operates on a clus- ter of stations. The program nds the shortest path through all stations in the cluster starting and ending at any station in the cluster. Let stations in the cluster be indexed with i and j , and let bikes in the cluster be indexed with k . 4.1.1 Input ti;j - Travel time between all pairs of stations bi;k 2 f0;1g - Whether or not a bike is in a station ak - Age of each bike p - Process time required to maintain a bike m - Minimum age of bikes to maintain The program uses this input to compute the following constant, which rep- resents whether or not each bike should be maintained: gk = ( 0; ak < m 1; ak m 3 4.1.2 Integer Program Let V represent the set of all stations in a particular cluster. For each pair of stations (i;j ) such that i 6=j , let ei;j be a binary decision variable which represents whether or not the path involves travelling along the edge from station i to station j . minimize X i " X j 6=i ti;j ei;j + X k bi;k gk p # subject to 8i : X j ei;j 1 (1) 8j : X i ei;j 1 (2) X i X j 6=i ei;j =jV j 1 (3) 8S V : X i2S X j 6=i2S ei;j j S j 1 (4) The objective function of the integer program minimizes the total time spent in the cluster. The rst term of the sum accounts for time spent travelling between stations, and the second term accounts for time spent maintaining the bikes at each station. Since the second term contains purely constants, it is not required in the objective function, but we keep it, as it is used as input for the intercluster model. Constraints (2) and (1) force each station to be entered and left at most once, respectively. We use for these constraints to allow endpoints. Together with the other constraints, constraints (3) and (4) force the directed graph generated by our set of vis- ited edges to be acyclic (Lemma 1) and weakly connected (Lemma 2). The objective function and constraints force the program to produce the shortest path through all stations in the cluster starting and ending at any station in the cluster. Lemma 1.The intracluster program’s constraints imply that the edges se- lected by the program always produce an acyclic, directed graph. Proof.Suppose a cycle exists within our set of visited edges. Examine the subset of stations,S , which are members of this cycle. Since constraints (1) and (2) force every station to be entered and left at most once, the number 4 edges in the graph of S is exactly jS j. However, since S V , this possibility either violates constraint (4), which forces the total number of edges of any subset to be less than or equal to jS j 1, or constraint (3), which forces the total number of edges of any subset to be equal to jV j 1. Therefore, the graph is acyclic. Lemma 2.The intracluster program’s constraints imply that the edges se- lected by the program always produce a weakly connected, directed graph. Proof.Suppose the set of visited edges is not weakly connected. This is equivalent to suggesting that there exists some subset of stations S which has no edges which would weakly connect any station in S to any station outside of S . Constraint (4) implies that the the number of edges within S is less than or equal to jS j 1. Let’s examine the complement of S ,S C . Constraint (4) implies that the number of edges within S C is less than or equal to jS C j 1. Since there are no edges which weakly connect S and S C and S and S C are mutually exclusive, the total number of edges in the path is less than or equal to jS j +jS C j 2. However, constraint (3) implies that the number of edges in path is exactly equal to jV j 1. Since jV j =jS j +jS C j, the total number of edges in the set of visited stations is exactly equal to jS j +jS C j 1. There is a contradiction, as the number of edges in the set of visited stations cannot simultaneously be equal to jS j +jS C j 1 and less than or equal to jS j +jS C j 2. Therefore, these constraints imply that the program always produces a weakly connected graph. 4.2 Intercluster Program The intercluster program is an integer program which operates on all clusters. The program nds a path between the clusters of stations which maximize the total score and begins and ends at an origin known as the garage. It requires the results of the intracluster program run on all clusters. Let clusters be indexed with i and j , stations be indexed with k , and bikes be indexed with l. 5 4.2.1 Input ti;j - Travel time between the last station of the path through one cluster to the rst station of the path through another cluster for all pairs of clusters oi - Time spent maintaining bikes and travelling between stations in each cluster (objective value of intracluster programs) si;k 2 f0;1g - Whether or not a station is in a cluster bk;l 2 f0;1g - Whether or not a bike is in a station - Maximum trip time al - Age of each bike f (a) - Function which maps bike age to score 4.2.2 Integer Program Let C represent the set of all clusters including the garage, represented by index 0. For each cluster i, let ci be a binary decision variable which represents whether or not the path visits cluster i. For each pair of clusters (i;j ) such that i 6=j , let ei;j be a decision variable which represents whether or not the path involves travelling along the edge from cluster i to cluster j . maximize X i " ci X k si;k X l f (al )bk;l !# subject to c0 = 1 (5) 8i : X j ei;j =ci (6) 8j : X i ei;j =cj (7) X i " X j 6=i (ti;j ei;j ) +oi ci # (8) 8S C : X i2S X j 6=i2S ei;j j S j 1 (9) 6 The objective function of the integer program maximizes the total score of bikes maintained among visited clusters. Constraint (5) ensures that the garage is always visited. Constraints (7) and (6) force each cluster to be entered and left if and only if they are visited, respectively. Constraint (8) ensures that the path ts within the maximum time constraint. The rst term of the sum accounts for time spent travelling between clusters, and the second term accounts for time spent maintaining the bikes and travelling within clusters. Together with the other constraints, constraint (9) force the directed graph generated by our set of visited edges to be weakly connected and that the graph produced by the set of visited edges among every proper subset of the set of visited stations is acyclic. The objective function and constraints force the program to produce a path through the clusters which provide the highest score given the time constraint. Since the intercluster program is not guaranteed to produce an optimal route between clusters, an additional step is required. Once the optimization completes, we change the objective function to minimize total time and add a constraint which limits the objective value to exactly the objective value of the rst optimization. This forces the model to produce a path which the same clusters that optimizes travel time. 4.3 Controller The controller is the remaining piece of the algorithm which operates the intracluster and intercluster programs. The controller receives the following input from the user of the model. k - Number of clusters li;j - Locations (longitude and latitude) of all stations ti;j - Travel time between all pairs of stations bi;k - Whether or not a bike is in a station ak - Age of each bike p - Process time required to maintain a bike m - Minimum age of bikes to maintain - Maximum trip time 7 f (a) - Function which maps bike age to score Using the given number of clusters k , the controller performs k-means clustering with k-means++ initialization on the stations by their longitude and latitude. The controller then lters out stations which contain no bikes above the minimum age, discarding empty clusters if necessary; it is not necessary to consider stations which contain no bikes to maintain. The controller runs the intracluster program on each cluster and saves objective value and path each program computes. The controller then runs the intercluster program using the saved information from the intracluster programs. Because the intracluster program produces a path which visits every sta- tion in a cluster, the nal path generated by the model often includes visiting low scoring stations which are members of visited clusters instead of high scoring stations located in unvisited clusters. This occurs most often in cases with a strict maximum time constraint. This is a positive characteristic in the sense that it is forces the maintenance of many bikes in a nearby area, reducing unnecessary travel time between clusters and allowing more bikes to be maintained in a given time frame. In addition, given that it is infrequent for bikes to move between clusters, the bikes within clusters are predictably maintained in \waves". However, if bikes move between clusters often enough, it possible for a portion of bikes to \avoid" maintenance for long periods of time. To account for this issue, we append an optional feature to the model which prioritizes high scoring stations across all clusters. If the feature is active, the controller rst computes the score of every station si using the controller’s input. si = X k f (ak )bi;k (10) The controller then clusters the stations as described previously and lters out stations without bikes above the minimum age. At this point, an iterative process begins. For each iteration i, we dene a minimum station score equal to the ith largest station score. Stations which do not satisfy this minimum score are ltered out, and the controller operates on the remaining stations as usual. The process iterates until the minimum station score includes all stations, saving the path of the iteration which produced the highest objective value. 8 Model Travel Time (m) Compute Time (s) Global 1877.817 4.383 Clustered 1904.428 0.131 Table 1: Results for the each model of the visiting every station experiment. Score is omitted because all models produce paths which visit every station; they all produce the same score. 5 Experiments For each of the following comparisons, we use f (a) =e a 6 to map bike age (in days) to score, a bike process time of 5 minutes, and travel times fetched using Bing Maps Distance Matrix API. In order for each model to remain computationally feasible, we lazily add the constraints that apply to each subset of stations. Whenever an incumbent solution is found, a constraint is lazily applied if a cycle is detected in the proposed path. 5.1 Global Model To serve as the baseline for our model, we use an integer program which treats the problem as a prize-collecting travelling salesman problem with budget constraint, which operates similarly to our intercluster model, but instead of linking clusters, it provides a route through all stations using the same score-based objective function. This is the model described in Section 3. 5.2 Visiting Every Station Experiment In this comparison, we set no maximum time or minimum bike age, which allows each model to provide an optimal route through all stations. The op- tional feature is omitted because its goal is to prioritize high-scoring stations across clusters. Since every station is visited, it produces the same results as the ordinary clustered model. The results are presented in Table 1 and displayed in Figure 1. The clustered model produces results quite compara- ble to the global model; it provides a route which takes only 1:4% more time with signicantly lower computational cost. 9 Figure 1: Paths produced by the visiting every station experiment. Station colors represent cluster membership. 10 Model Total Score Travel Time (m) Compute Time (s) Global 5386.802 719.215 1308.745 Clustered 3447.705 630.477 0.095 Optional 5110.959 718.815 1.470 Table 2: Results of the maximum time constraint experiment. 5.3 Maximum Time Constraint Experiment In this comparison, we set a maximum time constraint of 720 minutes, which allows each model to showcase its ability to construct a route when it is impossible to reach every station. The results are presented in Table 2 and displayed in Figure 2. This experiment shows the aws of the ordinary model clearly; it is only capable of creating paths which visit every station of the visited clusters. This results in often not making use of the entire time constraint and a signicantly reduced score. However, the optional feature produces results quite comparable to the global model; it provides a route with 94.9% of the score of the global model with approximately a thousandth of the computational cost. Both the optional feature and global model produce routes which make the most of the 12 hour time constraint. 5.4 Simulation To illustrate the success of these models over time, we propose a simple simulation. In this simulation, bikes are maintained once per week for six months. Given the lack of data about which bike is at which station, the simple simulation does not move bikes between stations. Bike counts at each station are initialized from a specic moment in time from ValleyBike’s data, and their initial ages in days are uniformly sampled as integers in the range [1;30]. For each day of the simulation, each bike’s age is incremented. Every seventh day, bikes at stations which are visited by the model have their ages reset to zero as long as their age is at least the minimum bike age. Each maintenance run has a maximum time of 500 minutes with a minimum bike age of 10 days. The results of the simulation are presented in Table 3. The distribution of bikes ages at the end of the simulation are displayed in Figure 3. After running the global model for over 24 hours, it was not able to 11 Figure 2: Paths produced by each model of the maximum time constraint experiment 12 Model Total Travel Time (m) Total Compute Time (s) Global -- Clustered 10777.915 2.065 Optional 12378.125 27.246 Table 3: Results of the maximum time constraint experiment. (a) Clustered model (b) Clustered model with optional feature Figure 3: Distribution of bike ages at the end of the simulation 13 produce a single route for the simulation. This is surprising, considering the global model was placed in a similar situation in the maximum time constraint experiment and required approximately 1300 seconds to produce a route. Since the constraints which prevent cycles are added lazily, the actual run time of this model varies wildly based on the number of constraints added to the model before it nds an optimal route. This unpredictable runtime, with the possibility to exceed an entire day, is a clear limitation of the global model. Comparing the clustered model with and without the optional feature, it is clear that both are capable of keeping ValleyBike’s bikes proactively maintained at least once per month. However, the clustered model without the optional feature is the clear choice because it manages to maintain a similar bike age distribution to the model with the optional feature using signicantly less total travel time. This is likely the case because it visits all stations in every visited cluster, reducing unnecessary travel time between stations. It is worth noting that the simulation’s assumption that bikes do not move between stations limits the value of these results. With bikes moving between stations, it is possible that the ordinary clustered model may have performed worse, as it is not as well equipped to deal with bikes which move between clusters frequently. 6 How to Apply the Results As alluded to earlier, there is signicant room for improvement in Val- ley Bike’s current maintenance method. The rst major change that needs to be made, regardless of whether or not the methods from this paper are implemented, is a system to keep track of which bikes were cleaned when. In the current method, each bike is inspected nearly daily, leading to time spend on bikes unnecessarily. If the bikes were to be logged when they were last inspected, then the workers would know which bikes do not need to be checked on. Our intent would be to provide this program to the company so that they could input the day’s data, and after a brief run they would be given an optimal route to follow with which bikes should be cleaned. The maintenance workers could then determine if it was worth going out on a specic day if the maximum score is low, that is to say if none of the bikes need to be inspected urgently. Every day the positions of the bikes would need to be updated in 14 the system. As an initial setup, all bikes would need to be given an \age", which indicates how recently a bike has been seen. A formula would also have to be input to weight the dierent ages of the bikes. A constant for minimum bike age to be ignored as well as a constant for minimum weight per station visited would also have to be input. Optimal values and functions for these inputs are not found in this paper, but they would make for an interesting avenue of exploration in future research. 7 Future Avenues of Research As just mentioned, there are many areas to go from the base of this paper in future research. Most recently mentioned was a algorithm for age that optimizes the monthly time spent to visit all of the bikes. In a similar vein, nding the optimal minimal value for each station in order to maximize score would also prove useful to the overall problem. Another huge assumption we made in the construction of our problem is that the bikes remain stationary all day. In a reality, this is very much not the case. Bikes are constantly on the move as they are rebalanced or used by customers, so it is very possible that the maintenance worker would go to a station with the intent to work on a bike, and by the time they got there the bike would be gone. Fortunately with our system, the current rate of movement of bikes is relatively low, but in larger, more established systems our model would not work well with this assumption. Another advancement that could be made from this paper is the extension of the program to model longer periods of time. Currently our model would work on a day by day basis, eectively telling you a route for a given day. A more interesting result could be found if the model worked for the entire month, but then the positions of the bikes would have to be incorporated in a more complicated way. This paper could also be improved by attempting to use dierent solutions to the Traveling Salesman problem, and comparing them to our result. We were able to show that our cluster method is more computationally ecient than a brute force TSP approach, but it may not be more ecient than other, cleverer TSP algorithms. A nal avenue for future research is determining what minimum weight for a route is necessary for a maintenance trip to be worth it. To provide an extreme example, it would not be worth a worker’s time to go out and repair 15 bikes if all of them were seen before. On the other hand, it would absolutely be necessary to go out if every bike has not been seen in the past month. So somewhere in between is a weighted value that is the tipping point between not going out for repairs and going out for repairs. Finding this tipping point would likely be an interesting experiment. 16 Optimizing for New ValleyBike Stations at UMass Chengyang Miao Michael Roo Stephen LaFauci December 2019 1 Introduction Since the opening of ValleyBike Share just last year, students and locals of the Pioneer Valley have traveled over 250,00 miles on rentable electric-assist bicycles. Each season, new investment funds allow ValleyBike to add new bike-pickup/dropo locations anywhere in the Valley. How can they be sure that a new station loca- tion will be benecial to ValleyBike customers and protable to ValleyBike? The techniques of mathematical modeling in business analytics help us determine, based on a dataset, which potential station-locations are most likely to be protable. In our use case, we limit ourselves to the University of Massachusetts campus. We consider adding small stations of four docks, medium stations of eight docks, and large stations of twelve docks. For each prospective station-size, where on campus would it be most protable to build? 2 Data Currently there are ve bike stations on UMass campus located at Northeast, Hasbrouck lab, Central, Haigis Mall, and Southwest. To start o, we created a voronoi diagram of campus (see Figure 1). A voronoi diagram splits a plane into smaller regions around specied points. To do this, we used the current bike stations as the central points of the diagram, found the midpoint between each pair of points, and drew a line perpendicular to the pair. After all of the lines are drawn in, the voronoi diagram then shows smaller regions that indicate the closest station to someone depending on where they are standing on campus. Using this, along with a student survey, we determined which campus locations would be the most promising for prospective new stations. For our project, we looked at building a new station at Dubois library, McGuirk stadium, Honors college, Mullins center, and the Engineering quad. With the current ve stations and the potential new stations, we measured the estimated distances from a station to every other station to see just how far apart they all are. Finally, based on the trac of particular locations, we assign \population" values p to each location. Later these will help us model the prot gained by a particular location-station match. Figure 1: 1 3 Modeling We model UMass’s campus as an instance of the uncapacitated facility location problem, which we then solve as a linear program. We say that locations on UMass’s campus are \clients", and bike stations are \facilities". When we build a new bike station or continue operating an existing one, we say we open that facility. Firstly, we let cij be the prot of assigning a client j 2 D to facility f 2 F . To be specic, we model cij by p d2 , where p is population in a location and d is the distance between location j and the station i to which it is being assigned. To prevent prot from approaching innity for assignments of a location to its own station, we upper bound our prot by a multiple of the location j ’s population:cij kp, for some constant k. 2 Table 1: Modeled assignment-values p per location Northeast 500.0 Hasbrouck 250.0 Greeno 250.0 Haigis 750.0 Southwest 1000.0 DuBois 2000.0 McGuirk 750.0 Honors 250.0 Mullins 500.0 Eng Quad 250.0 Table 2: Distances d between locations Northeast Hasbrouck Central Haigis Southwest Library McGuirk Honors Mullins EngQuad Northeast 0.0 0.1537 0.5712 0.4924 0.8333 0.3517 1.35 0.4861 0.4797 0.1846 Hasbrouck 0.1537 0.0 0.5276 0.4918 0.6869 0.24 1.2 0.3844 0.5049 0.2562 Central 0.5712 0.5276 0.0 0.4812 0.8115 0.5969 1.33 0.6873 0.8577 0.8143 Haigis 0.4924 0.4918 0.4812 0.0 0.3511 0.2083 0.8492 0.1893 0.4208 0.4734 Southwest 0.8333 0.6869 0.8115 0.3511 0.0 0.503 0.5 0.1893 0.3255 0.7954 Library 0.3517 0.24 0.5969 0.2083 0.503 0.0 0.9793 0.1515 0.2378 0.2941 McGuirk 1.35 1.2 1.33 0.8492 0.5 0.9793 0.0 0.7327 0.832 1.32 Honors 0.4861 0.3844 0.6873 0.1893 0.1893 0.1515 0.7327 0.0 0.2026 0.4342 Mullins 0.4797 0.5049 0.8577 0.4208 0.3255 0.2378 0.832 0.2026 0.0 0.4475 EngQuad 0.1846 0.2562 0.8143 0.4734 0.7954 0.2941 1.32 0.4342 0.4475 0.0 In addition we let fi be the cost of opening facility i. We model this value by summing cost-estimates of materials, labor, etc., and then scaling this result by the number of docks at the prospective new station, so that fi = 4375 numberofdocks +2500, where $4375 reects the cost per new bike and $2500 reects other miscellaneous costs. Let yi 2 f1;0g be 1 if and only if facility i is open, and 0 otherwise. Let xij 2 f1;0g be 1 if and only if client j is assigned facility i, and 0 otherwise. We aim to maximize the total prot of location all those stations, which equals the money we are making minus the cost of new stations. Maximize: X i2Fj2D cij xij X i2F fi yi We also have following constraints to apply to our linear program. First of all each location must be assigned to one station. In addition, locations can only be assigned to open bike stations. Subject to: X i2F xij = 1 8j 2 D xij yij 8i 2 F;j 2 D Lastly, we account for already-built stations by setting necessary yi ’s to 1. 4 Results and Reections For a large station (we consider size of 12 docks), our model suggests building at the DuBois library, which would give our objective function an optimal value of 8:04 104 (See Figure 2 for an updated Voronoi Diagram). This makes sense intuitively since DuBois has more inbound and outbound student trac than any other other location on campus. For the case of 8-docks, our model suggests building at (again) the Dubois Library, and also at McGuirk stadium, for an optimal value of 1:8 105 (See Figure 3 for an updated 3 Voronoi Diagram). McGuirk in particular seems a sensible result when we consider our prot function cij =p=d2 . Since McGuirk has a high distance d from any other location, it is naturally most protable to open its own station as soon as it is ecient to do so. Finally we consider the case of small 4-dock stations. Our model suggests building again at DuBois and McGuirk, but now also at the Mullins center, quite close to the DuBois library (See Figure 4 for the nal Voronoi Diagram). This result as well makes intuitive sense, since some 8-station prospectively built at DuBois according to the previous result could easily be divided into two smaller stations. Most interestingly, we now have an even higher value for our objective function, 2:9 105 , which suggests that ultimately, building many smaller stations is more protable than building a few large stations. Figure 2: Figure 3: 4 Figure 4: 5 5 Next Steps And/Or Improvements 1. Improvement for cij Our prot-estimation function might be more precise if we change cij to a1 p c1 (a2 d c2 )2 +c3 where a1 ;a2 ;c1 ;c2 ;c3 are all constant.Since we want cij be positive, we have to make c1 < ai PI ; c2 < a2 dij ;(dij c2 )<0 then we will have to add extra constraints as following:a1 >0,a2 >0,c1 a1 < pi ,c2 a2 < dij ,c3 >0. 2. Additional data 6 Since we are using estimate population data for our calculation, the result will be more accurate if we have the exact number of population for a certain location. 3. Cooperation with other groups There was another group that was working on optimizing the number of docks per station. If we were to use the data that they found, and implement this into our project, we could nd the most optimal locations for the most optimal sized stations. This could potentially provide a better solution. 4. On the other hand, it may not be much more dicult to implement a smaller sub-program into our current program, so that the number-of-docks variable n could vary between size 1 and size N (where n N :8n), rather than limiting our scope to 4-stations, 8-stations, and 12-stations. In our current formulation, each of 10 locations is initialized with a prospective station of size n. We initialize the prot-estimation and facility-cost-estimation functions, then solve for optimal yi activations and xij assignments for all j and all i. Instead of initializing c;f;y;x for only stations of size n, let us initialize c;f;y;x for all 1 n N , where each location has prospectively a station of each 1 n N . We add the constraint that in the nal solution, each location can build only a single station of a single size. By solving this more exhaustive linear program, we determine not only optimal build-locations for arbitrary n, but for all 1 n N . 6 Conclusion In short, we recommend either: (1) a new 12-station at the DuBois library, or (2) two 8-stations at Dubois Library and McGuirk Stadium, or (3) three 4-stations at DuBois Library, McGuirk Stadium, and Mullins center. In the future, more precise data for location-populations and trac would help to better model the prot of particular assignments. In particular, future cooperation with other Bike Share researchers would help account for the possibilities of varying station-sizes, impacts of parallel routes, or more, making sure that Bike Share continues improving for both ValleyBike itself as well as citizens of the Pioneer Valley. 7 References 1. Singhvi, D., Singhvi, S., Frazier, P. I., Henderson, S. G., O’ Mahony, E., Shmoys, D. B., & Woodard, D. B. (2015). Predicting Bike Usage for New York City’s Bike Sharing System.Computational Sus- tainability: Papers from the 2015 AAAI Workshop. 2. Williamson, D. P., & Shmoys, D. B. (2011).The Design of approximation algorithms. New York, NY: Cambridge University Press. 7 Optimal Bike Allocation for ValleyBike Share Program Brian Dang bldang Guilherme Silva guilhermesil Nilay Sadavarte nsadavarte Ben Silver bcsilver 1 Introduction Bicycle-sharing systems, or bike-share systems, are systems set up within various locations in or- der to offer the availability of bikes to be used by people to get from one place to another. Typically, stations are positioned at popular locations in or- der for the users of the system to conveniently pick up and drop off their borrowed bikes. While these systems offer an excellent service by providing an important and useful mode of transportation in beneﬁcial locations, problems such as asymmetric demand and rebalancing are challenges that pre- vent bike-share systems from being more success- ful. These terms and our approach to resolve them will be discussed further in this paper. 2 ValleyBike Western Massachusetts is home to a region known as The Pioneer Valley. Within the valley is the Franklin, Hampshire, and Hampden Coun- ties. This area contains buzzing communities in Springﬁeld, Holyoke, Northampton, Hadley, and Amherst. As of 2018, these communities are tied together by another means: the newly cre- ated ValleyBike Share. This recently established bike-share system has added over 50 bike stations where customers can take out, ride, and return their borrowed bikes. Offering electric pedal as- sist bikes, the ValleyBike Share service delivers a new means for public transit. With over 126,000 routes already taken, this certainly is a useful and popular service. While boasting a signiﬁcant initial year of bike usage, ValleyBike is not immune to the challenges that bike-share systems face. Having too many or too few bikes at a station at any given time is one way for potential dissatisﬁed customers. 3 Our Goal We are interested in an optimal allocation of bikes at each station such that the number of stock outs are minimized for each station. We deﬁne a stock out as an instance in which a customer would ei- ther not be able to dock their bike as a product of all docks being occupied or not be able to take a bike because there are none available. 4 Research The ﬁrst paper we read was “Data-driven rebal- ancing methods for bike-share systems” (Freund et al.,2017). From this paper, we learned about the nature of bike share systems, and its’ unique “asymmetric demand” problem. Bike stations spread out across a region, have customers de- manding the service of the bike stations availabil- ity. This demand is seeking both bikes and open docks (to return their bike). When a customer is unsatisﬁed, it is due to a stockout – which means there is either no available bikes or no available open docks. Rebalancing is the term for adjusting the num- ber of bikes at any given station in order to reset its bike and dock values to better serve the customer demand. This paper discussed different methods of rebalancing both during the day and throughout the night. The paper also discussed a user dissatisfaction function. This UDF deﬁnes the process of remov- ing and docking bikes where removing reduces the total number of bikes at a station by one and dock- ing increases the total number of bikes by one. Unless, of course, if the station is full then a bike cannot be docked and if the station is empty a bike cannot be removed. The paper also mentions its’ use of the Poisson process which is something we were able to incor- porate into our model, and we will discuss later. The next paper we read was “Minimizing Mul- timodular Functions and Allocating Capacity in Bike-Sharing Systems” (Freund et al.,2016). In this paper, we read about how a greedy algorithm takes out all the bikes and then adds them back in one by one. By then improving the objective function little by little, the allocation of bikes can be adjusted until no more room for improvement exists. Another paper we read was “Data Analysis and Optimization for (Citi)Bike Sharing” (O’Mahony and Shmoys,2015). This paper mentions applying penalties to when a station doesn’t have ideal val- ues. That is, too many or too few bikes. Also, this paper suggests “minimizing the sum of the differ- ences...” which is another concept we were able to incorporate into our model. Since our group’s main objective was to ﬁnd the appropriate number of bikes/docks for each sta- tionatanygiventime, wetookvariouspiecesfrom these papers and developed our models. 5 Initial Work and Data Collection We highlighted some of the data that would help us with the solution to our problem. This included: the total number of bikes and the number of sta- tions, their locations, capacities, and the rates of removal and docking of bikes. We approached the problem by treating it as a minimization problem, trying to minimize the dif- ference between the number of bikes we want to allocate to each station and the number we would actually be able to supply. We wanted to incorporate certain penalty val- ues depending on the conditions that each station was in. Our goal was to penalize not having the optimal allocation at a station, but to varying de- grees considering the current rates at the station and whether we were allocating more or less bikes than desired, which led us to having four possible scenarios. We used variable Sst =jxst yst j to model our difference, but in order to take the situation into consideration we expanded on Sst to become the following: Sst =xst yst ;r d (1) Sst =yst xst ;r d (2) Sst =xst yst ;r d (3) Sst =yst xst ;r d (4) Where scenario 1 meant that there were more bikes allocated than optimal and the rate of re- moval from that station was higher than the rate of docking. Scenario 2 means that there were less bikes allocated than optimal and the rate of re- moval was lower than the rate of docking. Sce- nario 3 means that there were more bikes allocated than optimal and the rate of removal was less than the rate of docking. Scenario 4 means that there were less bikes allocated than optimal and the rate of removal was higher than the rate of docking. We further constrained Sst such that we grouped similar situations together and came up with the following: Sst 0 Enforces that Sst is a nonnegative number. Sst xst yst +B2 C1 +B4 C2 Sst xst yst If either scenarios B2 or B4 are activated, it is scaled by an associated constant C such that it compensates for the discrepancy and the second part makes it an equality. Sst yst xst +B1 C3 +B3 C4 Sst yst xst Similar to the previous set of equations, if either scenarios B1 or B3 are activated, it is scaled by an associated constant C such that it compensates for the discrepancy and the second part makes it an equality. In order to know how to penalize our model, we created a new variable wst such that it takes on the value of the penalty multiplied by the value of Sst which is determined by the scenario it falls under and can be described by the following: wst = 8 > > > > < > > > > : Psmall Sst Sst =xst yst ;r d Psmall Sst Sst =yst xst ;r d Plarge Sst Sst =xst yst ;r d Plarge Sst Sst =yst xst ;r d Our motivation behind assigning either a small or large penalty was determined if the scenario was worse for the model. Two of the situations are worse than the other two; for example, having more bikes than ideal at a time where more peo- pleareremovingbikesthanreturningisbetterthan having more bikes than ideal and a higher rate of returning bikes (that would only increase the delta – difference between the actual number of bikes and the ideal number). Likewise, there are situa- tions that consider the reverse ratios. We also decided on implicit constraints. This included requiring all number of bikes to be non- negative integers, the number of bikes at any sta- tion can never exceed the capacity of the station, the number of total bikes in the system is a con- stant, etc. The data provided by ValleyBike allowed us to calculate the rates of removal and docking for each station within a given time period. The 2019 data was relatively easy to work with as it provided each route with its start and end date and time of day as well as stations. Using Python and the Pandas library, we formatted the csv ﬁles into a dataframe containing all of the relevant informa- tion for the rides. From this initial dataframe, we branched it off into two separate dataframes that were grouped by starting station and ending sta- tion, respectively. In each dataframe we further grouped the entries by the week of the month, the day of the week, and the hours of that day and ex- tracted the rates of docking and removal by look- ing at the number of elements within each group. Regarding the 2018 data, it was not formatted as nicely as the 2019 and therefore some more work went into getting the rates for that dataset, which included matching gps locations and calculating the time a ride started and ended. Eventually the data was fully collected and we averaged out the 2018 and 2019 rates with the corresponding days. When vizualing rates for the stations, we came across a problem, which was that the rates were very low or, for large ranges of time, simply zero. This made it so our expected value, which was cal- culated by seeing if the rate at that station at that time was within its capacity and if it was above capacity, we would take the ratio of the rates and allocate according to that ratio. With the patches of the rate being zero, we decided to simply say it can be split in half, or can just be interpreted as just leaving it alone for that hour. This initial model proved to be uninformative and rather inefﬁcient, as the the constantly chang- ing rates would suggest for frequent rebalancing but the rates were always low enough that there was not always an urgent need to rebalance every hour. Given this inefﬁcient nature, we decided that hourly rebalancing was not ideal and we should limit rebalancing to once or twice a day and sim- ply take the busy hours of the day into consider- ation for the rate, so our collected and calculated data was still essential, it just needed to be further tweaked. 6 Final Model Our current model includes these features: an iter- ative approachby adding onebike at a time to each station; each station is chosen by what will mini- mize the number of stockouts through a greedy al- gorithm. The steps involved in our algorithm are: 1.Gather data on docking rates for each loca- tion at a given time 2.Use Poisson distribution to give a probability to a sequence of docking and removal 3.Calculate the number of stockouts with the current allocation of bikes 4.Get the weighted number of stockouts through step 2 and 3 5.Temporarily add one bike to each station to see what decreases the number the most 6.Repeat until the loss does not decrease any- more or there are no more bikes For our data, we compared the number of bikes being docked at a station with the number of bikes being removed at that station within that given timeframe. Thisratioisdockingtoremoving. The ratio produces the rate for that time frame (i.e. 4 docking and 4 removing gives a ratio of 4:4 and a rate of .5). The docking rate will become the ex- pected number of bikes that should be docked at the station. In order to come up with the number of possi- ble stockouts, we created 32,768 sequences (215 ). Each possible sequence represents an arrangement of 15 customers approaching a station looking to either dock or remove a bike (+ indicates docking, - indicates removal). 32,768 is all possible situa- tions from all 15 looking to remove bikes, to all 15 looking to dock bikes, and every combination inbetween. APoissondistributionappliedtothese sequences determines the probability of likelihood of each situation and determines which is the most likely. Given the current allocation of bikes at a given station, we can then calculate the number of stock- outs. While we begin by setting the number of bikes at any given station to 2 in order to run through these sequences, we add in bikes one by one at each iteration in order to minimize the av- erage number of stockouts at every station. After calculating the number of stockouts with a given allocation for every sequence, a Poisson distribu- tion is used with being the average number of docking at a station and x being the number of docking in a sequence. The weighted number of stockouts is the product of the probability the se- quence can occur and the number of stockouts as- sociated with it. Afterwards, we take the sum of it to be the loss of that station. This is done for every station. After the loss is calculated between each sta- tion, we add one to their allocation and calculate the loss. Between all the stations, the maximum difference in loss will be the station in which a bike will be added. Once the bike is added, the it- erative step starts over from the top with the new allocation as input. The process is repeated until there are no bikes to add or any bike added to the system will produce a suboptimal allocation. 7 Results The ﬁnal model was used to calculate the optimal allocations for each station at a certain time pe- riod and in this case, it is a Monday in August. In Figure 1, the station Amherst Town Hall would be optimal with 8 bikes and the Basketball Hall of Fame with 8 bikes. For some of the data like UMass Southwest with 10 bikes, it makes sense to have a high number of bikes when considering its capacity. This is due to the fact that the station is very active when it comes to removing bikes. This happens to be reﬂected in our model. We initially set the number of bikes that can be used by ValleyBike to be 500. After the model had ﬁnished running, it did not use all of the 500 bikes. In fact, it was optimal with 72 bikes re- maining or 428 bikes in the system. Intuitively, having more bikes in the system should produce better results, but in this case, having more bikes can lead to more stockouts. From a rider’s per- spective, there might not be enough docks when they take a bike out from a station to go to another. The ﬁnal loss on the system is around 239.80 when summed across all the stations. Even though Figure 1: Example of a Final Allocation List on a Mon- day in August this does seem a little high, it has to be consid- ered that the result came from all 57 locations on 32,768 sequences. 8 Conclusion Using our ﬁnal model with weighted probabilities of sequences in order to predict the allocation of bikes that would minimize the average number of stockouts proved to be much more informative and effective than our initial model. Generalizing and averaging rates out daily instead of hourly also im- proved our results as there weren’t cases of rates of zero any more and every station had a concrete allocation rather than an estimation given by the ratio of rates. The ﬁnal model was intensive with calculations and took around a minute to fully run for each month, but it produced values that we feel are worth that run time. Given our model, we can use the trends of the past two years in order to pre- dict what they will be in the coming year and as time passes and we accrue more data, the model will only get more precise, as we are currently just taking the average of two data points for each day of the week in the month. Our ideal rebalancing schedule would be twice a day, once during the middle of the busy 12 hour period of the day and once again at the end of the 12 hours. This rebalancing schedule provides enough bikes to satisfy the needs of the night and early morning period, as most of it was zero or one bike per hour, and the mid-day rebalancing allows ValleyBike to restock at stations that are in need and ﬁnish the day out. As far as suggestions for ValleyBike, we sug- gest that they implement a system that keeps real- time statistics on their bikes and stations, includ- ing being able to easily access the number of open docks and available bikes at a given station. This real-time tracker would allow them to have a sys- tem in place that can rank stations on their need of bikes or open docks and alert them for more efﬁ- cientrebalancing. Ahelpfulfeaturethatcouldalso be implemented is a tracker that counts how many peoplehaveclickedonastation intheapptoeither check if it has any available bikes or docks. With this system in place, you would be able to have a rough estimate of the demand for that station, be- cause if the docks were completely empty or full, you would not know if there would be a stockout event because there is no way to gauge whether or not someone wants to remove or dock a bike. Overall we hope that you ﬁnd our model help- ful to be able to ﬁnd optimal allocations for each station on a given day of a given month and that it may reduce the number of stockout events. References Freund, D., Henderson, S. G., and Shmoys, D. B. (2016). Minimizing multimodular functions and allocating capac- ity in bike-sharing systems. Freund, D., Norouzi-Fard, A., Paul, A., Henderson, S., and Shmoys, D. (2017). Data-driven rebalancing methods for bike-share systems. O’Mahony, E. and Shmoys, D. B. (2015). Data analysis and optimization for (citi)bike sharing. In Proceedings of the Twenty-Ninth AAAI Conference on Artiﬁcial Intelligence, AAAI’15, pages 687–694. AAAI Press. Optimal Truck-Based Bike Rebalancing for ValleyBike Share in Amherst, MA Jensen Beaumont Melissa Ligure Abhishek Ram Elliot Tower College of Natural Sciences, University of Massachusetts, Amherst, MA, 01003 December 18, 2019 Abstract In this paper, students from multiple disciplines within the Uni- versity of Massachusetts, Amherst, detail their research pertaining to the improvement of electric bike rebalancing for the local bike share co-op, ValleyBike Share. Though ValleyBike Share’s inuence extends further, the students will only focus on developing a method for mak- ing bike rebalancing via trucks more eective, such that maximization of the linear program ensures maximization of user satisfaction. This research focused on rebalancing the bikes in the Amherst area. Ultimately, the results found from this research lends itself further to the other surrounding areas that are under the purview of ValleyBike Share. Now, they have access to a model that can be readily applied to other urban or even rural areas, just based on the nature of the location for which the model was designed. Finally, this also provides further demonstrable proof of the applicability of a business model like that of ValleyBike Share to other areas similar to the Pioneer Valley, indicating that the electric bicycle co-ops need not be limited to cities. 1 Contents 1 Introduction 2 1.1 ValleyBike Share . . . . . . . . . . . . . . . . . . . . . . . . .2 1.2 Bike Rebalancing . . . . . . . . . . . . . . . . . . . . . . . . .3 1.3 Identifying the Location Purview . . . . . . . . . . . . . . . .4 1.4 Bike Rebalancing Logistics . . . . . . . . . . . . . . . . . . . .6 2 Literature Review 6 3 Dening the Linear Program 7 3.1 Dening the Notation . . . . . . . . . . . . . . . . . . . . . . .7 3.2 Dening the Variables . . . . . . . . . . . . . . . . . . . . . .8 3.3 Dening the Objective Function . . . . . . . . . . . . . . . . .9 3.4 Dening the Constraints . . . . . . . . . . . . . . . . . . . . .9 3.5 Data Needed for Yielding a Solution . . . . . . . . . . . . . .12 4 Generation of Necessary Functions 12 4.1 Neighborhood Function . . . . . . . . . . . . . . . . . . . . . .12 4.2 Determining the Metric to Maximize Bike Rebalancing . . . .14 4.3 start(s). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 4.4 min(s). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 5 Results 17 5.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 5.2 The Solution . . . . . . . . . . . . . . . . . . . . . . . . . . .18 6 Conclusion 20 7 Acknowledgements 22 1 Introduction 1.1 ValleyBike Share The scope of this research pertains to optimizing the functions of the Valley- Bike Share company based out of Northampton, MA, that was made possible in part thanks to a partnership between Bewegen Technologies and Corps Lo- gistics. ValleyBike Share is an electric bike rental service that supports the 2 towns of Narthampton, Amherst, Holyoke, Hadley, and Springeld, and in particular the campus of the University of Massachusetts, Amherst. As of the beginning of December 2019, ValleyBike Share customers have been able to travel 126,909 routes that have spanned 281,788.36 miles since the company’s inception. Additionally, the rst and second most popular stations from which their customers return and check out bikes are the UMass Southwest Residential Area and the UMass Knowlton Residential Halls.1 This statistic demonstrates a basis from which the problem this research pertains to will be addressed. 1.2 Bike Rebalancing Through the checkout and return of bikes between stations, many bikes will end up at one station, while very few will remain at others. This often leads to system failures resulting in customer dissatisfaction, and ultimately trans- lates to losses of prot. The two types of failure that will be addressed as part of this paper are starting failures and ending failures. A starting failure is when a customer approaches a station to rent out a bike, but there are no bikes at the station to check out or the bikes there don’t work or are not charged. In contrast, an end failure occurs when a customer approaches a station in order to return the bike they had checked out, only to realize that there is no open dock in that station for them to park their bike. In order to minimize these failures, it becomes necessary to develop a sys- tem for moving bikes around in an ecient manner between stations such that there can be enough bikes at each station to avoid starting failures but also not too many so that end failures are avoided as well. As ValleyBike Share supports multiple communities spread over approximately 200 square miles2 , it becomes necessary to utilize trucks or other similar means of storage and transportation to move bikes between stations. Currently, ValleyBike Share has three trucks that cover this whole area, whose capacities are 13, 10, and six bikes, respectively. The objective of this research is to develop a model 1 https://www.valleybike.org/#about-us 2 https://www.daftlogic.com/projects-google-maps-area-calculator-tool. htm# 3 that maximizes the number of bikes rebalanced over the period of a day via the use of the truck(s). 1.3 Identifying the Location Purview Based on the location dynamics of each respective community within Valley- Bike Share’s purview, it would be beyond the scope of this research to analyze each individual community to develop an aggregate model. As a result, the scope of rebalancing will focus on the locale on the Amherst primarily. As the two most popular stops for checking out and returning bikes are on the University of Massachusetts, Amherst, campus, narrowing the scope to this location still provides a sense of optimality as a whole. To further justify this scope narrowing, an analysis of the population of the several towns covered by the ValleyBike Share is analyzed. As of 2017, there are approximately 280,000 residents in all these ve towns.3 Of that population, counting the 30,593 students (Fall 2018)4 , 1,075 employees5 , and 1,462 faculty members6 (2019) of the University of Massachusetts, Amherst, and the remaining 9,000 residents of Amherst7 , addressing the bike rebal- ancing on campus rst would address the needs of approximately 14% of the total population ValleyBike Share could address. This is not a nominal sam- ple of prospective customers to address. Finally, due to the massive size of the population of the University of Massachusetts, Amherst, campus relative to its spatial size, modeling the campus can yield inheritable characteristics that are applicable to other similar cities. Additionally, the remainder of Amherst is more rural and suburban, and this model is also applicable to similar locales. To paraphrase, by beginning in Amherst, the research devel- ops an object oriented model of both small metropolis and rural/suburban behaviors that can be separated accordingly for other applications. Ulti- mately, all eorts to improve the rebalancing system on campus could be 3 https://pioneervalleydata.org/search-by-topic/demographics/ 4 https://www.umass.edu/admissions/facts-and-figures/ student-body-and-admissions-statistics 5 https://www.owler.com/company/umass 6 https://www.umass.edu/oir/sites/default/files/publications/factbooks/ facultystaff/FB_fs_05.pdf 7 https://pioneervalleydata.org/search-by-topic/demographics/ 4 further applicable with some changes to the rebalancing in the surrounding areas as well. The University of Massachusetts, Amherst, campus is a public university that sprawls over 1,463 acres8 on the border of Amherst, MA, with Hadley, MA. Within this area, there are seven stations that are either on campus or are common stations that students would utilize close to campus. Addition- ally, there are three stations within the town of Amherst (27.7 sq. miles) itself. These are detailed in Figure 1 below. Figure 1: UMass Amherst Station Locations. 8 https://www.usnews.com/best-colleges/umass-amherst-2221 5 1.4 Bike Rebalancing Logistics Since this research only analyzes a sample of the overall applicable pop- ulation, only one of the trucks will be utilized in order to maximize bike redistribution. Based o of repeated simulation. Additionally, in regards to rebalancing, the drivers and workers who cur- rently work on rebalancing only operate over an eight hour period every business day. As a result, the linear program that will be used to simulate and optimize the rebalancing system will not conside anytime ourtside of 9 AM - 5 PM every weekday and anytime over the weekend. 2 Literature Review In order to develop a proper model for simulating and optimizing truck re- balancing, multiple articles were utilized to advise the model’s improvement. For example, one of the articles was Simulation Optimization For a Large- Scale Bike-Sharing System (Jian, Freund, et. al.). This article detailed the eorts by the authors in order to improve the bike share system in New York City such that the number of failures were minimized. From this article, the denition of the dierent types of failures were extrapolated and further used in order to dene failures in this research. Ad- ditionally, from this article, the method for maximizing the bike rebalancing was rst addressed and introduced, and was thus an excellent article from which to iterate further to t a model to the present circumstances. The next article that was analyzed was Data-Driven Rebalancing Meth- ods for Bike-Share Systems (Freund, Norouzi-Fard, et. al.). This article provided a linear program that was very similar to what would be needed for the present circumstances of this research. Ultimately, the constraints presented in this article were tted to the ValleyBike Share system. The objective function, however, in combination with Jian, Freund, et. al., was utilized and tted further. Both articles provided the notion of a user dissatisfaction function (UDF). 6 This function varied per station. The purpose of this function was to convey information on the relationship between the initial number of bikes at a station and the number of failures over a specic time period. 3 Dening the Linear Program 3.1 Dening the Notation Notations are needed to identify certain information and indicates the nec- essary data for the integer program. They can be constants that are need to construct appropriate constraints for the integer program. The notations and their denitions are provided in Table 1. 7 Table 1: Notation Used in the Linear Program Notation Denition of Notation s 2 S Is the station s in the set of all stations S . t 2 [T ]Is the time step t in the set of all time steps [T ]. T Is the nal time step. k 2 K Is the truck k in the set of all trucks K . Number of bikes that can be picked up or dropped o in a single time step. start(s)Number of bikes at station s at the starting time. min(s)Number of bikes at station s that minimizes the number of dissatised customers. max(s)Capacity of station s. N (s)The set of stations a truck can move to from station s in a single time step (adjacent to it). S+Subset of stations where start(s)> min(s). S Subset of stations where start(s)min(s). Now that the notation is declared, the variables can be dened. 3.2 Dening the Variables In order to develop the linear program, a number of variables need to be dened. The variables are provided in Table 2. Table 2: The Variables Used in this Linear Program and Meanings Variable Denition of Variable xstk This is a binary variable. If it is 1, then truck k is at station s at time step t. Otherwise, it is 0. ystk This is a variable that has to be an integer 0. It indicates the number of bikes that truck k has access to at station s at time step t. btk This is a variable that has to be an integer 0. This is the number of bikes in truck k in time step t. 8 With all of the variables dened, the objective function and the con- straints can be generated. 3.3 Dening the Objective Function The initial function was obtained from Freund, Norouzi-Fard, et. al. Ef- fectively, this objective function attempted to minimize the number of dis- satised customers by maximizing the amount of bikes reallocated. This is done by ensuring that the most bikes possible are moved via the dierence between the bikes in station s at time one moved in truck k and at any later point for the same station is maximized. Additionally, this objective function also ensures that c(s), which is the slope of the user dissatisfaction function, is also maximized, indicating that one is maximizing the change in user dissatisfaction. maximize X s2S;k2[K ] (ys1k ystk )cs However, when the UDF was applied for the Amherst system, it was not providing accurate numbers. A simplied objective function was developed. The new objective function aims to minimize the how far o each station is from its ideal number of bikes by the nal time step. The new objective function is: minimize X s2S;k2[K ] j ysTk min(s)j 3.4 Dening the Constraints With the objective function fully dened, the constraints will be addressed. The constraints are given below. 9 xstk xs(t 1)k + X s0 :s2N (s0 ) xs0 (t 1)k ;8s;t;k ; (1) X s2S xstk = 1;8t;k ; (2) X k2[K ] ys1k =start(s);8s; (3) start(s) X k2[K ] ystk min(s);8s 2 S ;t; (4) min(s) X k2[K ] ystk start(s);8s 2 S+;t; (5) X s2S ystk +btk = X s2S ys1k +b1k ;8t;k ; (6) j ystk ys(t 1)k j x stk ;8s;t;k ; (7) j ystk ys(t 1)k j +j xstk xs(t 1)k j ;8s;t;k:(8) The denitions for these constraints within our linear program are pro- vided in Table 3. 10 Table 3: Explanation of the Constraints Used in the Linear Program Constraint Denition of Constraint Firm/Soft 1 Allows each truck to only move from one station to an- other that is adjacent (neighborhood). Firm 2 Indicates that at each time step, each truck must only be in one station. Firm 3 Indicates number of bikes at a station at the beginning.Firm 4 Ensures that the number of bikes stays between start(s) and min(s). Firm 5 Ensures that the number of bikes stays between start(s) and min(s). Firm 6 Ensures bikes cannot be added to the system.Firm 7 Ensures bikes are only picked up if the truck is at that station, and no more than a truck’s capacity of bikes are moved. Firm 8 Ensures truck either moves bikes or loads bikes in a time step, not both (often omitted). Soft These constraints can be further classied based upon their function with respect to the linear program as a whole. They can be classied into two categories: constraints that dictated what could be done within a time step, and constraints that maintained that the truck rebalancing does not exceed the capacity of the truck nor the stations. The constraints that dictated what can be completed within a time step are constraints 1, 2, and 7. These constraints dictate that trucks can only move to adjacent stations that are within a time step away in terms of travel time, that trucks can only be at one station at a time, and that bikes can only be picked up if the truck is at that station. The constraints that dictated the capacity restrictions are 3, 4, 5, 6 and 7. Constraint 3 dictates the number of bikes in each station at the beginning, and constraints 4 and 5 conjunctively ensure that the number of bikes in the station stays within the range of the starting bike number and the minimizer. Constraint 6 ensures that the system is a closed system and cannot be im- pacted by the additions of more bikes. Constraint 7 also dictates that while 11 trucks can only be at one station at a time, and bikes can only be picked up if the truck is at that station, the truck cannot exceed its capacity when it comes to rebalancing. 3.5 Data Needed for Yielding a Solution In order to obtain a solution, several dierent sets of data were necessary. The rst of these data sets was knowing the average number of bikes in each station available every day. This data set pertains directly to ensuring Constraints 3, 4 and 5 have realistic values from which to manage further iteration. It also ensures that Constraint 6 is maintained, where the sum of all bikes in each station at the beginning of the day cannot change as the day progresses. The second data set that was important to know was the optimal number of bikes that should be at each station during the day. This is also known as the minimizer,min(s), for each station. This data set is vital for enabling constraints 4 and 5, and for maximizing user satisfaction in the objective function (via the user dissatisfaction function). This information was pro- vided by peers in the Math 456 class at the University of Massachusetts, Amherst, taught under Professor Annie Raymond. The third data set that was vital to the linear program solution genera- tion was knowing the number of bikes that left each hour from each station as well as the number of bikes that return to each station every hour. This data set was also vital for developing the user dissatisfaction function. Next, the dierent nested functions that ensure that the linear program works as intended are detailed. 4 Generation of Necessary Functions 4.1 Neighborhood Function The neighborhood function,N (s), is a function that is utilized by the linear program to restrict trucks to only moving between stations that are closest to the station that they started at during that time step. Ultimately, the 12 function also restricts truck movement to only within that one time step. This ensures that trucks do not go to an arbitrarily far station rst, and in- stead focuses on rebalancing nearby stations and eventually moving towards the farther stations. In order to formulate this function, the distances between each station in our purview were calculated via Google Maps. Next, based on observance of trac patterns in Amherst, MA, a timestep of 10 minutes was determined to be an ideal value for the length of a time step. This now implies that all trucks must be able to move to a new station within a maximum of 10 minutes. To conrm that the 10 minute time step value would maintain itself both on and o campus, the time needed to travel between various stations was also measured via Google Maps, with focus on times of trac. It was found that the unpredictable nature of trac on college campuses made it so that our time to travel between stations was not totally accurate, but even at peak trac conditions, the time to move between stations still stayed below 10 minutes. To demonstrate, the distance between two on-campus stations, Knowlton Residential Hall and Haigis Mall, is only 0.9 miles, but based on peak trac conditions, could take 6-9 minutes to traverse via driving. In comparison, when comparing two o-campus stations, University Drive and East Hadley Road, the distance between them was 2.1 miles, but only took 5-8 minutes to travel. This is shown in Figure 2. To remedy this, the threshold by which the truck would move to the next station was updated. It was decided that each station must either by travelable within 10 minutes or it must be within 1.5 miles from each other. This therefore satises the case of trac overwhelming the 10 minute time constraint and the case of o-campus station distance overwhelming any distance constraint placed upon the linear program. 13 Figure 2: Possible routes in the Amherst area. 4.2 Determining the Metric to Maximize Bike Rebal- ancing In due course of generating the objective function, it became necessary to im- plement a metric that qualies the reason for redistributing the bikes in the measured fashion. This metric could take many forms, such as minimizing the distance traveled or time taken to travel between stations by the redis- tribution trucks or maximizing user satisfaction based on the user demand. As a result, the rst metric that was examined was the minimization of the distance traveled by the redistribution trucks. Although in theory this would be an ideal metric for justifying travel between stations, it was found that the dierences between the models that governed movement between stations on campus and o campus (the University Drive and Kendrick Park stations) were not trivial, and as a result, dierent models would need to be developed in due course. Although we are given introductory data for travel between stations on and o campus via Google Maps, it was determined that a totally accurate representation of the trac conditions was infeasible with- 14 out more data from actual car movement during various times. As a result, this metric for optimizing truck redistribution was also found infeasible. The next metric that was analyzed was minimization of time traveled between stations. This ensured that while the number of bikes transported would be maximized, the amount of time that is spent transporting is ulti- mately minimized. Using time for driving between stations, as provided by Google Maps, the next station to travel to rebalance can be picked based on the metric provided. However, as with minimization of distance traveled, this could also be subjective to the ow of trac and random incidents that would demonstrate a dierent optimal solution than what would be truly ideal. Also, upon closer examination of both metrics, it became apparent that both of these metrics do not directly account for the customer themselves. While it takes into account sustainability and prot earning via the mini- mization of resources utilized, neither of the above metrics really address the customer needs themselves. Ultimately, it was decided that the best way for maximizing user satisfac- tion is to quantify user dissatisfaction using a user dissatisfaction function. Many dierent methods for calculating a function that modeled user satis- faction were proposed, in particular methods that determined the likelihood of failure to be occurring as a random variable from a normal distribution (enabled by the Central Limit Theorem) and also via assuming that each station’s optimal number of bikes is half the station capacity. Ultimately, it was decided that the development of the user dissatisfaction function would be based upon the ratio between the number of bikes in the station at the beginning of the day/simulation divided by the total expected number of failures. By dening user dissatisfaction in this regard, that implies that the slope of user dissatisfaction is inversely proportional to the number of failures overall. This means that for the objective function, and by extension c(s), or the slope of the user dissatisfaction function, to be maximized, the number of failures would need to be minimized. The linear program constraints ensure that bike movement can only improve user satisfaction. In order to derive a function that accurately encompasses the quantica- tion of user satisfaction, the number of bikes that both left and were returned 15 to each station were measured and accounted for. The days measured and accounted for were Wednesday, September 11th, Thursday, September 19th, and Monday, September 30th, 2019 respectively. 4.3 start(s) For each station, a single value of start(s) was determined. This value for each station was derived via examination of data gathered over the course of a given week, in particular via averaging the number of bikes available at station s at the beginning of each day. Ultimately, it became necessary to adjust our linear program to enable dierent assumptions than what were made in Freund et. al. For exam- ple, Freund et. al. opted to complete the rebalancing procedure overnight, whereas we are opting to rebalance during the day. Additionally, we chose to restrict our data analysis for deriving start(s) to just being based on the weekdays. This was completed because the weekends appeared much more varied in demonstrated trends than what was seen in the weekdays, so it was felt that this data was not representative of the weekday trends that were focused on. Another assumption that was made via development of this metric is that the rebalancing would be completed by individuals who work 40 hours a week (8 hours a day, no weekends), thereby supporting both the assumptions for daytime balancings only on weekdays. 4.4 min(s) In order to properly identify the point at which user dissatisfaction is mini- mized, it became necessary to analyze the inow and outow of bikes across all stations over the period of these days and more. This analysis yielded parameters from which to improve the optimization further such that enough bikes are at each station so that the station will not run out at any point during that time step and that there are not too many bikes such that the station will ll up at any point during that time step. The new parameters are called the minimizer, denoted by min(s). As aforementioned, this new function corresponds directly to Constraints 4 and 5 of the linear program and is in fact a vital parameter for avoiding any sort of failure in due course of rebalancing. 16 Figure 3: Illustration of a single step in our linear program. 5 Results 5.1 Example In order to understand the results that arrived from solving this linear pro- gram given the circumstances, it must rst become necessary to envision an ideal solution received. Therefore, refer rst to Figure 3. To explain this graphic, imagine the process the linear program simulates where an external ow of bikes is created via truck rebalancing. Within this step, the objective is to identify two or more stations (A and B for this exam- ple), where in one or more stations has more bikes than the minimizer of user dissatisfaction min(s), and contradictorily, the other station(s) has less bikes than the minimizer for that station. In this case, the starting number of bikes in station A,start(A), equals 10, and likewise start(B ) = 12. Ultimately, we want to properly rebalance bikes in the next time step such that the number of bikes at the station at the next time step,t, or y (s;t;k ) equals min(s). Therefore,y(A;t;k) =min(A) = 11 and likewise y (B;t;k ) =min(B ) = 11. 17 Therefore, in order to ensure this happens, the truck must move 1 bike from Station B to Station A. In this way, the exact perfect number of bikes in each situation is satised, and ideally this implies that user dissatisfaction will be minimized. 5.2 The Solution Table 4 details teh time series for a solution that was identied by the above linear program. Table 4: Time series solution from the linear program. Time Step Station start(s)min(s)Bikes Bikes in Truck 0 9 10 7 10 0 1 3 9 6 6 3 2 9 10 7 7 6 3 6 8 7 7 7 4 8 6 7 7 6 5 2 7 6 6 7 6 10 4 6 6 5 7 7 6 10 10 1 8 5 5 6 6 0 9 1 8 9 8 1 10 4 6 5 6 0 Table 4 displays information as follows: 1.The rst column, named \Time Step", dictates the time step (20 min- utes each) that the truck is operating in, where time step zerp cprre- sponds only to time zero. Time step 1 goes from 0 to 20. 2.The second column, \Station", displays the location of the truck at the end of each time step. The corresponding station numbers are also detailed in Figure 4. 18 3.The third column, \start(s)", dictates the starting number of bikes in the station at the beginning of rebalancing. 4.The fourth column, \min(s)", shows the optimal number of bikes in each station that would minimize user dissatisfaction. 5.The fth column, \Bikes", demonstrates the number of bikes in each station once the rebalancing has occurred. Ideally, this value would be equal to min(s), which is what is also demonstrated for each station. 6.This sixth, and nal, column is the number of bikes in the truck at the end of that time step. Figure 4: Time series visualization of derived solution. Essentially, the solution derived by the linear program dictates that the truck must begin at Knowlton Hall, and then in the rst time step, it moves 19 to North Pleasant Street, where the truck collects three bikes. In the second step, the truck moves back to Knowlton Hall, where another three bikes are collected. In the third step, the truck moves from Knowlton Hall to Haigis Mall, where the truck collects another bike. Next, in the fourth step, the truck moves from Haigis Mall to UMass Integrative Learning Center (ILC), where the truck drops a bike o. In the fth step, the truck then drives over to Kendrick Park, where yet another bike is dropped o. Following this, the truck moves to UMass Central Residential Area in the sixth time step, where it drops two bikes o. In the seventh step, the truck moves onwards to UMass Southwest Residential Area, where four more bikes are dropped o. In the eighth step, the truck moves to University Drive, where it drops o the last bike in the truck. The truck then moves to the Amherst Town Hall in the ninth step, where it picks up a bike and then drives to East Hadley Road in the tenth time step to drop it o. 6 Conclusion The impact of this research is evident in the fact that there is now a model that helps to analyze the behaviors and characteristics of the area that is under the purview of ValleyBike Share. With this linear model, they can now take a fresh iterative step to improving their customer’s overall user satisfaction in one of their largest customer zones, Amherst, MA. To further accentuate the impact this research has on the current system, by rst ana- lyzing Amherst, a combination of a city and a more rural/suburban setting is analyzed. Due to the massive size of the University of Massachusetts, Amherst and the number of residents and employees within it relative to its spatial size, modeling campus interactions and behaviors is approximately equivalent to evaluating a city like Springeld or Holyoke. Additionally, as the rest of Amherst is more rural/suburban, its interactions and behaviors are more aligned with the towns of Northampton and Hadley. Additionally, the impact of this simulation and analysis provides another step towards optimizing the age-old question in business: how to make cus- tomers happy while maximizing prots. Where the articles within the liter- ature review focused on New York City, this research focused on the much more rural setting of Pioneer Valley, Western Massachusetts. As a result, the changes and updates that were made to t closer to the setting of Pi- 20 oneer Valley provided a new iteration to nding an all-encompassing solution. Finally, the impact of this research could also be evident in similar bike share systems in more rural settings around the country. In this decade, there has been a dramatic rise in the number of electric bikes available all around, and this trend is on the rise. This research can inspire more bike shares to develop all over the country because it demonstrates that it is a viable business even outside a major metropolis. While developing this linear program, several dierent assumptions and simplications were made that may or may not be notably dierent in a real world application of this linear program. An example of this potential source of error is the assumption that the number of bikes within the system would not change during the model running. In reality, some bikes may be rented by a customer, be taken away for repair or maintenance, or as these are elec- tric bikes, some may be in need of charge and cannot be taken out. These potentialities correspond to removing bikes from the system, and this would directly violate Constraint 6, which states that bikes cannot be added to the program. In future work, it might be ideal to append and amend the above linear program by adding constraints that account for the status of bikes. Another assumption that was created that may not hold up is the as- sumption that the user dissatisfaction function was the ideal method for improving user satisfaction. Due to the relative unpredictability of a com- bination of trac modeling and user satisfaction modeling, it could be very possible that the ideal metric for maximizing satisfaction could be via mini- mizing distance traveled or time spent moving between stations. As a result, any future work could address these metrics instead. Another potential source for future work is analyzing the path yielded from this solution. In reality, though a specic time series was derived, it could be possible that there are multiple other solutions that could be equally optimal or even more so, especially in terms of a combination of minimizing user dissatisfaction and minimizing distance traveled or time traveled. Fur- ther testing of these dierent paths could verify the above derived solution or may glean other optional paths that are more ideal for certain conditions as opposed to others. 21 7 Acknowledgements Behind all successes in the world, there is a strong community backing from a person or people that helped the rise to success. This situation with the above presented research is no exception to this theme. As a result, we the authors would like to thank the people at ValleyBike Share for enabling the opportunity to work on this project. Secondly, we would like to thank Professor Annie Raymond and Daniel Gallagher for advising us every step of the way and providing their much needed experience and support to ensure that we were able to succeed. 22