Loading...
Rail Trails impact on property values in Northampton- Nick HortonJournalofStatisticsEducation,Volume23,Number2,(2015) RailTrailsandPropertyValues:IsThereanAssociation? EllaHartenianSmithCollege NicholasJ.HortonAmherstCollege JournalofStatisticsEducation Volume23,Number3(2015)www.amstat.org/publications/jse/v23n2/horton.pdf Copyright c 2015byEllaHartenianandNicholasJ.Horton,allrightsreserved.Thistextmaybefreelysharedamongindividuals,butitmaynotberepublishedinanymediumwithoutexpresswrittenconsentfromtheauthorsandadvancenotificationoftheeditor. KeyWords:bikingscore,linearparks,greenways,multipleregression,realestate,walkingscore,Zillow.com Abstract TheRailTrailandPropertyValuesdatasetincludesinformationonasetof n =104homeswhichsoldinNorthampton,Massachusettsin2007.Thedatasetprovideshouseinfor-mation(squarefootage,acreage,numberofbedrooms,etc.),priceestimates(fromZil-low.com)atfourtimepoints,location,distancefromarailtrailinthecommunity,bikingscore,andwalkingscore.Thedatasetisamenabletousewithexploratorydataanalysisinintroductorycourses,intermediatecourseswithafocusonvisualizationandmultivariaterelationshipsaswellasadvancedcoursesthatutilizerepeatedmeasuresregressionmodelsandmoresophisticatedgraphics. 1 JournalofStatisticsEducation,Volume23,Number2,(2015) Figure1:MapoftheCentralMassachusettsRailroadnetworkfrom1888. 1.Introduction RailroadstransportedmanyofthegoodsandpeopleintheUnitedStatesfromthemiddleofthe18thcenturyuntiltheearly20thcentury.Inthe1870s,thereweremorethanadozencompetingrailroadsystemsinSouthernNewEnglandalone,withnearlyeverymediumtolargersizetownconnectedtooneormoreofthesenetworks.Asanexample,Figure 1displaysamapoftheextensiveandinterconnectedCentralMassachusettsRailroadsystemin1888anditsconnectionstoothercompaniesinMassachusetts. However,asautomobilesbecamemoreaffordableandroadinfrastructureexpanded,traintraveldecreaseddramatically.Onlymajorlinesremainedintactwhileahugenumberofbranchlineswereremovedfromserviceandtracksreclaimedforscrap.InMassachusettsalonethousandsofmilesofthesecorridorswereformallyabandonedandmanyofthesepathswerelosttodevelopmentorreuse.ThepassageoftheRailroadRevitalizationandRegulatoryReformActin1976andtheNationalTrailsSystemActin1978ledtofundingandtechnicalassistancetopreservethecorridors(“rail-banking”).Thislegislationspurredtheconversionofmanyoftheseoldtrackbedstomulti-usetrails. In1986,aninitialproposalwasbroughtforwardbythen-GovernorJohnAshcroftofMis-souritoconvert185milesofunusedrailroadbedintowhatbecametheKatyTrail,oneofthefirstrailtrailsintheUS.Thatsameyear,theRails-to-TrailsConservancy 2 JournalofStatisticsEducation,Volume23,Number2,(2015) (http://www.railstotrails.org)wasestablishedandbeganadvocatingnationallyfortrailde-velopment.By1989,200railtrailswereopenacrossthecountryandtheRails-to-TrailsConservancyhadover7,000members.Nowthereare20,000milesoftrailandtensofmillionsofuserseachyear,withthousandsofadditionalconversionsunderway(RailstoTrailsConservancy2015).Throughoutthelastdecadesofthe20thcentury,the“railstotrails”movementgainedincreasingvisibilitywithintheburgeoningenvironmentalmove-ment;thetrailssimultaneouslyprovidedanalternativetoauto-centeredtransportationaswellasawayforpeopletoenjoynaturewithoutcontributingtocarbonemissions. Asecondarybenefitofthetrailshasbeentospurexercise.Morethan40percentofAmer-icanadultsdonotengageinleisure-timephysicalactivityandasimilarnumberareover-weightorobese(CentersforDiseaseControlandPrevention2010).Assuch,theexistenceofrailtrailsrespondstoapublichealthneedforexerciseonsafe,scenicandaccessiblepaths.Becausemanytrailslinkthecentersofcitiesandtowns,railtrailsfacilitateexercisethroughdailyactivitieslikecommutingorgroceryshopping.OneobservationalstudyinIndianafoundthat70%ofusersofsixdifferentpathsstatewidereportedthattheyexercisedmoreasadirectresultofthetrail(EppleyInstituteforParks&PublicLands2001). Advocatesofthesetrailshavealsosuggestedthattheyhaveaneconomicbenefitforthecommunitieswheretheyarelocated.Proximitytoparksandgreenspaceshasthoughttobeassociatedwithanincreaseinpropertyvalues.Howeverrailtrails(alsoreferredtoaslinearparks)arenotnecessarilyanalogoustotraditionalparks.Railtrailsarelongcorridorsofpavedtrail,whichmayormaynotbesurroundedbygreenery,unlikethetypicallylargerandconsistentlyverdantlandscapeofparks.Additionally,railtrailsprovideeasyaccesstolocationsalongthetrail,afeaturemoreanalogoustoproximitytoasubwaylinethanapark.Therearefewstudiesempiricallylinkingrailtrailswitheconomicactivityorchangesinhomepricesalthoughonecouldimagineincreasesinpropertyvaluesrelatedtoeasyaccesstothetrailandtheamenitiesalongthetrail,orasaresultofincreasedeconomicactivitybytrailusers(AmericanTrails2011). Twostudieshavelookedspecificallyathousepricesandrailtrailswithahedonicpricemodel.Thiseconometricmodelisusedwhenagood(e.g.,ahouseprice)canbebro-kenupintodiscretecomponentsandamarketvaluecanbeattributedtoeachcomponent.Karedeniz (2008)attheUniversityofCincinnatiexaminedhomevaluesalongtheLittleMiamiScenicTrailinsouthwestOhioandfoundthatforhouseswithin10,000feetofthetrail,proximitytothetrailpositivelyimpactspropertyvalues.Lindseyetal.(2004)usedahedonicpricemodeltomeasuretheimpactofgreenwaysonpropertyvaluesinIndianapo- 3 JournalofStatisticsEducation,Volume23,Number2,(2015) lis,Indiana.Theyfoundthattheassociationbetweenpriceandtrailsvariedfromtrailtotrail.WhilehomesneartheMononTrailsoldforabout11%morethanhousesmorethanhalfamileaway,therewaslittleornodifferenceinpriceforhomesnearothertrailsinIndianapolis. Anecdotalevidencealsosuggeststhatrailtrailsareanincreasinglysought-afteramenity.TheNationalAssociationofHomeBuildersfoundthatnearly36%of2,000recenthome-buyerssaidmulti-usetrailswere“important”or“veryimportant”intheirchoiceofahome(AmericanTrails2011).Thisisalsoconsistentwithotherevidencesuchastheinclusionofa“BikePath”fieldontheNewEnglandMultipleListingServiceandthenumberofhitsonthehousingsectionofCraigslistfor“Railtrail”,suggestingthatproximitytotrailsisanincreasinglyimportantamenityforbuyersandrenters. 2.Goal Thisdatasetcouldbeusedinafirstcourseinstatisticswhichincludesexploratorydataanalysis,visualizationormodeling.Thisdatasetisalsoappropriateforasecondormoreadvancedcourseinstatisticsthatincludesmodulesonmultipleregressionmodeling,datamanipulation,mappingorrepeatedmeasures.Itcanbeusedtoassesstheassociationofvariousfactorswithchangeinpropertyvaluestoshedlightonrelationshipsofinterestintermsofurbanplanningandsustainabledevelopment.Thedatasetprovidesmaterialforadiscussiononconfoundingvariables,categoricalversuscontinuousvariables,andthepro-cessofscrapingdatafromtheinternet.Studentscanlookatbasicdescriptivestatisticsandcomparehousesofsimilarstructures,thenassesswhethertherearepredictorsofchangeinpriceofhouses.Eveninthissimpleanalysis,aftercontrollingforbaselineprice,housesclosertothetrailappreciatedmorethanhousesfurtherawaybetween1998and2014. Wefirstdescribethedatasetindetailincludinghowthehomevaluedatawasobtained,howthedistancebetweeneachhouseandarailtrailentrancewascalculated,andhowothervariableswerescrapedfromtheinternet.Wethenconsiderhowtoundertakeexploratoryanalysisandmodeling,includinguseofrepeatedmeasuresregressiontoassesswhetherthereisdifferentialpricegrowthforpropertiesnearertotherailtrail.Weconcludethathomeswithinahalfmileoftherailtrailtendtoappreciatemorethanhomesfurtheraway,andsuggestseveralotherpossibleavenuesforanalysis. 4 JournalofStatisticsEducation,Volume23,Number2,(2015) 3.Dataset 3.1Background DatawerecollectedonhousesalesinNorthampton,Massachusetts,acityofmorethan28,500peoplelocatedalongtheConnecticutRiverapproximately100mileswestofBoston,Massachusettsand80milesnorthofNewHaven,Connecticut.Thecity,foundedin1654,servedasthenexusoftraintravelbetweenBostonandNewHaveninthenineteenthcen-turyuntilpassengerservicewasdiscontinuedinthe1920’sandfreightservicelargelyabandonedinthe1960’s. ThedatasetwascollectedusinginformationfromNorthamptonbecausetheconversionofanearly3milelongsectionoftheWilliamsburgBranchoftheNewHavenrailroadintooneoftheoldestmunicipallyoperatedrailtrailsin1984allowsustoassesspossiblechangesovertimebetween1998–2014.Thisnaturalexperimentallowsustocomparethechangeofhousepricesforhousesclosetothetrailversusthosethataremoredistant.Figure 2displaysamapofthistrail(labeledFrancisRyanNorthamptonBikeway).Anumberof MASSASOIT Ice Pond Trail BRIDGE RD FL O R ENCE RD RYAN RD MOUNTAIN RIVER ROA D AVEST ST ST N O RT H F A R M S RD B U R T S PIT RD FLORENCE RD K I N G S T ELM ST BRIDGE ST P I N E S T DAMON RD PROSPECT ST RIVERSIDE DR EASTHAMPTON RD R O CKY HILL R D NEW SOUTH ST P L E A S A N T S T FL O RENCE ST EARLE ST HATFIELD PROSPECT C L E M E N T LADD ELM ST NORTH MAPLE CHESTNUT ST JACKSON ST CHAPEL ST N . E L M STRAWBARDWELL ADARE LillyLibrary I-91 Leeds Elem. J.F.K. Middle Sch. Jackson Elem. SmithVocational NorthamptonH.S.Bridge St.Elem. Ryan Rd.Elem. RiverConnecticut CooleyDickinson Look Park Childs Park 0 1 Mile 0 1 Kilometer Existing Path Planned Path Northampton Town BoundarySources: Northampton OPD and DPW, MassGIS, and Smith CollegeSpatial Analysis Lab.Original cartography: A. Nyren SC '06Review and edits: N. Horton, J. CarisCopyright © 2006 Friends of Northampton Trails and Greenways, Inc. under GNU General Public LicenseMap date: May 2007 To Amherst, UMass and Belchertown To Manhan Rail-Trail connection, Easthampton and Southampton K I N G S T S TA T E S T N ORTH ST PLEASANT STMAIN ST M A RKET ST S U M M E R S T OLD SOUTH ST STODDARD ST WOOD M O NT RD N O R T H S T PROSPECT ST Friends of Northampton (MA) Trails &Greenways is a community-based non-profit 501(c)(3) organizationdevoted to improving the expandingrail-trail network in Northampton, MA. http://www.fntg.net/ Suggested Route between Norwottuck & Northampton BikewayStop & Shop 0 1,000 Feet 0 100 Meters Proposed extensionto Williamsburg N o r w o t t u c k R a il -T r a il NorwottuckRail-TrailNorthampton Bikeway Proposed extension to Mill River, Veteran'sField and Easthampton ELM ST S O U T H S T BRIDGE BEDFORD TERR Hatfield Hadley EasthamptonWesthampton Suggested Route Downtown Northampton W i lli a m s b u r g F I N N S T Path Under Construction Operating Railroad WEST ST T R U M B U L L S T KING ST ForbesLibrary To Fitzgerald LakeConservation Area CENTER ST P R O P S ECT Do Not cross the railroad.Use the suggestedgreen route. Finalconstruction will include arailroad crossing. u n d e r p a s s Florence Francis Ryan Northampton Bikeway William Nagle Sr.Walkway ElwellStatePark SmithCollege Network of Rail-Trails in Northampton, Massachusetts C OLE S ME AD O W R D FitzgeraldLake Hatfield Hadley Northampton Figure2:MapoftheNorthamptonRailtrail,datingfrom1984,connectingfromKingStreettoBridgeStreet/LookPark(2.7miles). otherrailtrailprojectsfollowedinandaroundNorthampton,includingtheNorwottuckRailtrail(1992,10miles),theManhanRailtrail(2003,5miles)andthe2011linkagebetweentheNorthamptonandEasthamptontrailnetworks. 5 JournalofStatisticsEducation,Volume23,Number2,(2015) 3.2HousePrices WebeganwiththesetofallhomessoldinNorthamptonin2006and2007,anduseddatafromZillowtoascertainhouseprices.Zillow.comisarealestatewebsitethatprovidesinformationonresidentialpropertyacrossthecountry.Houses(includingthosecurrentlyforsaleandthosenotforsale)arelistedonthewebsite.“Zestimates”ofhomevaluearebasedeitheronlistpriceor,whenthehouseisn’tforsale,analgorithmisusedtoestimatethepricewhichtakesintoaccountthehome’scharacteristicsaswellasthelocalhousingmarket.Inadditiontothe“Zestimate”isaValueRange(whichcorrespondstoa70%confidenceinterval,http://www.zillow.com/zestimate). SearchingforahousebyaddressonZillow.comproducesabird’seyeviewoftheneigh-borhoodwithpriceestimatesforthataddressandsurroundinghouses.Figure 3 displaystheZillowinformationforoneofthehomesinourdataset,whileFigure 4 displaysthishome’sZillowestimateovertime,alongwithotherhousesinitszipcodeandNorthamptonasawhole. Figure3:AnimageoftheZillowpageforonehomeinourstudy,showingabird’seyeviewoftheneighborhoodaswellasprovidinggeneralhousinginformationincludingpriceandsquarefootage. Photos,basicinformationandpricetrendsareavailableformanyhouses.Theaccuracyoftheinformationisdependentuponthelocationofthehome;onlysomecountiespubliclyreleasehousingattributeinformation.Incountiesthatdon’t,Zillowusesmarkettrends, 6 JournalofStatisticsEducation,Volume23,Number2,(2015) Figure4:Theten-yearpricedataforahouseinourstudy,takenin2010.Thetrendsforthehouse,zipcodeandtownaredisplayed.Themoneysymbolsrepresentdatesthehousewassold. user-enteredinformationandlist/sellingpricewhenavailable.AsofJune2014,Zillowclaimedthatthe70.8%oftheir“Zestimates”intheBoston,Massachusettsarea(1.4millionhomes)werewithin10%ofthesellingprice,withamedianerrorof5.9%. Zillowhascreatedanapplicationprogramminginterface(API)tofacilitateautomatedac-cesstotheirdatabase(“datascraping”).UsersneedtoregisterwithZillowandreceiveaZillowIDwhichallowsafinitenumberofdatabaseaccessesperdayforfree.TheZil-lowpackageinR(availablefrom http://www.omegahat.org/Zillow)allowstheusertofindtheZillowestimateforapropertyspecifiedbyastreetaddress,findinformationaboutthehouse(e.g.numberofbedroomsandlotsize)aswellasfindcomparableproperties.MoreinformationabouttheZillowAPIcanbefoundat http://www.zillow.com/wikipages/Zillow-API. WeusedZillow’sestimatedpricein2007alongwiththe10-yearhistoryofhomevaluestoextracttheestimatedpriceofeachofthehomesin1998andthenadditionallycollectedthepriceestimatein2011and2014,yieldingfourestimatedpricesforeachhome. Toallowcomparisonsofthehousepricesovertime,weadjustedfortheConsumerPriceIndexusingaBureauofLaborStatisticscalculator(http://data.bls.gov/cgi-bin/cpicalc.pl) 7 JournalofStatisticsEducation,Volume23,Number2,(2015) toallowallpricestobeexpressedin2014dollars.Asanexample,aUSdollarin1998wouldbeworth$1.46in2014equivalents. HelpfulHint:Chapter14of Utts (2005)(readingtheeconomicnews)providesanexcellentintroductiontopriceindexingandinflation. 3.3HouseCharacteristics HomecharacteristicsandamenitieswereobtainedthroughthegenerouscollaborationofCraigDellaPennathroughtheMultipleListingServiceforhomesonthemarketin2007.Thisincludedwhichzipcode(01060or01062),householdarea(inthousandsofsquarefeet),numberofacres,numberofgarages,numberofbedrooms,andnumberoffullbath-rooms.Thenumberofgarageswasrecodedtoadichotomousvariable(havinganygarageversusnogarage)whilethenumberofbedroomswascategorizedas1-2,3,or4+bed-rooms. AlternativeApplication:Variablessuchasthenumberofbedroomsorthenumberofbathroomscouldbeconsideredeithercategoricalorquantitative.Somediscussionofmodelingtradeoffsmaybeappropriatetodiscuss. 3.4DistancefromRailTrail The“MyPlaces”featureofGoogleMapswasusedtocalculatethedistancefromeachhousetotherailtrail.Thisapplicationallowsuserstotracelinesonamapfromonelo-cation(ahome)toanotherlocation(arailtrailentrance)andtoidentifythelengthofthelineinfeet.Followingtheshortestcombinationofroadsandsidewalksfromeachhometothenearesttrail-entrancepoint,wecollecteddataontheproximitytothetrail.Figure 5displaysanimageshowingthedistancecalculationfortwoofthehomesinourstudyusingGoogle’s“MyPlaces”application,wherethebluelineindicatestheshortestlinkonroadsandsidewalks,withthegreenlineindicatingthetrail. ThisdatasetcanbeusedtoassesswhetherhomesclosertotheoriginalNorthamptonrailtrailappreciatedmorebetween1998to2014relativetomoredistanthousesforthesetofallhousesinNorthamptonwhichsoldin2007(beforethemostrecent“GreatReces-sion”).Thistaskwasgreatlyfacilitatedduetotheemergenceofaweb-basedrepositoryof 8 JournalofStatisticsEducation,Volume23,Number2,(2015) Figure5:TwohousesinourstudyarelinkedtotheNorthamptonrailtrailbytheshort-estroute,(tracedinblue),totwotrailaccesspointsrepresentedbygreenflags.Googleindicatesthelengthoftheselinesinfeetinasidebar. informationaboutresidentialproperties(includinghomecharacteristics,salehistoryand“Zestimates”ofhouseprices)throughouttheUS. Wedividedoursampleintotwogroupsbasedontheirproximitytothetrail,usinganarbitrarybutusefulmetrictodescribeanaverageresident’sabilitytoaccessthepathsbybikeoronfoot.Ourfirstgroup(n =404)wascomprisedofhouseswithinahalfmileofthetrail,equivalenttoawalkof10minutesat3mph.Wecalledthisgroup“Closer.”Thesecondgroupconsistedofanyhousefurtherawaythanonehalfmile(equivalenttomorethanaten-minutewalk,n =64).Weconsideredtheresidentsofthesehousesaslessabletoeasilyaccessthetrails(thisgroupiscalled“Fartheraway”). 3.5NeighborhoodCharacteristics Measuresofwalkabilityandbikeabilitywerescrapedfromthewebsite http://www.walkscore.com.Thesescoresrangefrom0–100withhigherscoresbeingbetter.Awalkscoreinthe 9 JournalofStatisticsEducation,Volume23,Number2,(2015) 90’sisdescribedasa“Walker’sparadise:dailyerrandsdonotrequireacar”,whileabikescoreinthe90’sisdescribedas“Biker’sparadise:flatasapancake,excellentbikelanes.” 3.6CreationofAnalyticSample Werestrictedouranalysistohomeswithatmost0.56acreintotalpropertysize,sincethiswasthelargestlotsizeforhomesneartherailtrail(tenhomesweredropped).Inaddi-tion,twohomeswereeliminatedfromtheanalysesbecausetheyappreciatedmorethan$500,000between1998and2007andwereidentifiedrespectivelyasa“massivefixer-upper”anda“majorleaguemakeover”(personalcommunication,realtorCraigDellaPenna).Thisleftuswithananalyticdatasetof n =104homes. HelpfulHint:Theidentificationandhandlingofoutliersisanimportanttopicforanystatisticalanalysis. Thefollowingfilescanbedownloadedfromthe JSE website: •Theoriginaldataset,http://www.amstat.org/publications/jse/v23n2/horton/dataset in wideformat.csv; •talldataset,http://www.amstat.org/publications/jse/v23n2/horton/dataset in tallformat.csv; •codebook,http://www.amstat.org/publications/jse/v23n2/horton/documentation.docx; •andRMarkdowncodeusedtogeneratefiguresandrunanalyses,http://www.amstat.org/publications/jse/v23n2/horton/Rmarkdown sourcefile.Rmd. ThecodeinR(RCoreTeam2015)leveragesroutinesfromthe mosaic packagetofacilitatetheteachingofstatistics(Pruimetal.2015),functionsfromthe dplyr (WickhamandFrancois2015;Hortonetal.inpress)and tidyr (Wickham2014)packagestoundertakedatamanipulation,mappingroutinesfromthe ggmap package(KahleandWickham2015)aswellasthe knitr packageforreproducibleanalysis(Xie2015).MoreinformationontheuseofRMarkdowninintroductorystatisticscanbefoundin(Baumeretal.2014). 10 JournalofStatisticsEducation,Volume23,Number2,(2015) 4.Analyses Hereweundertakeaseriesofanalysesusingthesedata,includingexploratoryanalysisofsinglevariablesandbivariaterelationships,mapping,unadjustedandadjustedcomparisonsofpricechangesbygroup,datatransformations,andmoresophisticatedapproachesusingrepeatedmeasuresmodels. 4.1ExploratoryAnalysis Figure 6 displaysaplotmatrixof2007priceandhomecharacteristics(includingdistancefromtherailtrail).ThediagonalentriesofFigure 6 providedensityplotsofasubsetofthestudyvariables(forcontinuousmeasuresincludingprice[inthousandsofdollars],acreage,theactualnumberofbedrooms,andsquarefootage[inthousandsoffeet])andbarchartsforthecategoricalvariables(distancegroup).Thelowertriangularentriesofthisfigureprovidescatterplotsforcombinationsofcontinuousmeasuresorstackeddotplotsforcom-binationsinvolvingcategoricalvariables.Theuppertriangularentriesdisplaycorrelations(forcombinationsofcontinuousmeasures)orsidebysideboxplots(forcombinationsofcontinuousandcategoricalvariables). pr i c e 2 0 0 7 be d r o o m s ac r e sq u a r e f e e t gr o u p price2007 bedrooms acre squarefeet group 200300400500600700 Corr:0.546 Corr:0.00551 Corr:0.865 Closer Farther 246 Corr:0.0568 Corr:0.703 0.10.20.30.40.5 Corr:0.0526 1234 0.02.55.07.510.012.5 0.02.55.07.510.012.5 200 400 600 2 4 6 0.0 0.2 0.4 0.6 1 2 3 4 Closer Farther Figure6:Plotmatrixofstudyvariables.Thediagonaldisplaystheunivariatedistributionofeachvariable,whiletheoff-diagonalsprovidebivariaterelationships. 11 JournalofStatisticsEducation,Volume23,Number2,(2015) Thedistributionofpriceandsquarefootageareallskewedwithlongtailstotheright.Thereisastrongcorrelationbetweenpriceandsquarefootage,aswellasanindicationthatthenumberofhouseswith1,5,or6bedroomsissparse[whichledtoitsuseasacategoricalvariablewithlevels1-2,3,4+].Priceappearstobeanonlinearfunctionofthenumberofbedroomsandthatthenumberofbedroomsispositivelyassociatedwiththesquarefootage.Intermsofthedistancefromtherailtrail,housesclosertotherailtrailtendedtobemoreexpensivein2007,havelessacreage,havemorebedroomsandbelarger. AlternativeApplication:The squarefeet variableisskewedwithalongrighttail.Itmightbeworthwhiletoconsiderotherwaystoincorporatethisintoaregression(e.g.,transformationorcategorization). Figure 7 displaysaplotmatrixofchangeinpricefrom1998to2014,pricein1998,plusneighborhoodcharacteristicsanddistancefromtherailtrail. di f f 2 0 1 4 ad j 1 9 9 8 wa l k s c o r e bi k e s c o r e gr o u p diff2014 adj1998 walkscore bikescore group -2000200400 Corr:0.198 Corr:0.351 Corr:0.339 Closer Farther 100200300400 Corr:0.455 Corr:0.461 0255075 Corr:0.929 255075100 051015 051015 -200 0 200 400 100 200 300 400 5000 25 50 75 100 25 50 75 100 Closer Farther Figure7:Plotmatrixofothervariables.Thediagonaldisplaystheunivariatedistributionofeachvariable,whiletheoff-diagonalsprovidebivariaterelationships. Thedistributionofdifferencesinpricesandadjusted1998price[bothinthousandsofdollars]haveheavytails,butareroughlysymmetric.Thedistributionofwalkscoresisal-mosttriangular(manywithlowscores),whilethebikescoresaresomewhatbimodal(with 12 JournalofStatisticsEducation,Volume23,Number2,(2015) almostnooverlapbetweenthebikescoresbydistancegroup).Thereisastrongcorrela-tionbetweenthetwomeasuresofwalkabilityandbikeability,withpositivecorrelationsbetweenpriceandscore. It’sstraightforwardtocalculatesummarystatisticsbygroupaswellasgenerateothergraphicalrepresentations.Asanexample,considerthefollowingdistributionofpricechange(inthousandsofdollars)from1998to2014bydistancegroup.Table 1 displaysavarietyofsummarystatisticsbygroup,whileFigure 8 displaysoverlaiddensityplots. Table1:Summarystatisticsofdifferenceinadjustedpricefrom1998to2014(inthousandsofdollars)bydistancegroup GroupminQ1medianQ3maxmeansdn missing1Closer15.4275.4294.50136.47497.82114.0281.874002FartherAway-199.8740.3859.3583.72282.4766.1067.3764 0 Price change 1998 to 2014 (in thousands of 2014 dollars) De n s i t y 0.000 0.002 0.004 0.006 0.008 0.010 0.012 -200 0 200 400 CloserFarther Away Figure8:Densityplotofpricechangefrom1998to2014(inthousandsof2014dollars). Wenoteanumberofoutlyingvalues,whichcanandshouldbeidentified.Table 2 displaysthosewithvalueslessthan-190andmorethan250. 13 JournalofStatisticsEducation,Volume23,Number2,(2015) 42.30 42.32 42.34 42.36 !72.725 !72.700 !72.675 !72.650 !72.625lon la t walkscore 25 50 75 distgroup Closer Farther Away Figure9:Mapofwalkingscorebylocation(withdifferentcolorsfordistancefromrailtrail) 4.2Mapping Spatialrelationshipscanbeveryimportant,anditisstraightforwardtoincorporatethegeolocations(latitudeandlongitude)ofthehousestotrytodiscernpatterns. HelpfulHint:NicolasChristouofUCLAhascreatedmaterialstofacilitatetheteachingofspatialdataanalysisforintroductorystatisticscourses:seehttp://www.stat.ucla.edu/spatial fordetails. Table2:Outlyingobservationsofpricechanges(inthousandsofdollars) streetnostreetnameprice1998adj1998price2014diff20141497BurtsPitRd292.50427.55227.68-199.872248ProspectStreet170.00248.49520.15271.66314LibertyStreet166.00242.64525.11282.47440TrumbullRoad261.00381.50879.33 497.82 14 JournalofStatisticsEducation,Volume23,Number2,(2015) Figure 9 displaysthedistributionofwalkingscores(arrangedbysize)bylocation(withdifferentcolorsforthegroupsdenotingdistancefromtherailtrail).Thehousesthatareclosertendtohavehigherwalkingscores. 4.3ComparisonsofHousesbyDistance Ourgoalwastocreategroupsofhouseswithsimilarcharacteristicsthatinfluencepricetherebytryingtoreconstructarandomizedcomparison.Weundertookthisadjustmentbycontrollingforpotentialconfoundingfactors(numberofbedrooms,acreageandsquarefootage)usingmultipleregressionmodeling. AsdemonstratedinFigures 6 and 7,thehousesinthetwodistancegroupswerenotequiv-alentatthestartofthestudy. HelpfulHint:Oneofthemostimportantconditionsfordrawingcausalconclu-sionsfromastatisticalstudyisrandomizationofsubjectstotreatmentgroups.Obviously,houseswerenotrandomizedtobenearorfarfromtherailtrail.Whencomparinghousesclosetothetrailtohousesfartheraway,theremaybeotherfactorsthatdifferbetweenthehousesbeyondproximitytotherailtrailthataccountforthedifferencesinprice.Thepotentialforconfoundingisanimportanttopictoreinforceforstudents(see Kaplan (2012)orchapters4–8ofVittinghoffetal.(2012)foracomprehensiveandreadableintroductionforinstructors). 4.4ComparingDistanceGroups Table 3 comparesthedistributionofthehousecharacteristics.UsingtheWilcoxonranksumtesttest(forcontinuouscharacteristics:numberofacresandsquarefeet)andFisher’sexacttest(forcategoricalcharacteristics:numberofbathrooms,bedroomsandgaragespaces)weseethattherearesignificantdifferencesbetweenthedistancegroupsintermsofthenumberofbedrooms(p =0.024),existingofagarage(p =0.028),acreage(p =0.005),squarefootage(p =0.0006),andlocation(p =0.0009).Homesneartherailtendtohavemorebedrooms,agarage,lessacreage,alargerinterior,andbeinNorthampton. 15 JournalofStatisticsEducation,Volume23,Number2,(2015) Table3:Comparisonofhousecharacteristics.Fisher’sexacttestusedforcomparisonsofcategoricalvari-ables(numberofbathrooms,bedroomsandexistenceofgarage)whiletheWilcoxonranksumtesttestwasusedforcontinuousvariables(numberofacresandsquarefeet). House characteristicCloser (n=40)Fartheraway (n=64)p-valueNumberoffullbathrooms(0-4)¯X =1.55 ¯X =1.39 p =0.51Percentbedrooms(1-2/3/4+)15%/35%/50%16%/59%/25%p =0.024Percentwithgarage65%42%p =0.028Acreage¯X =0.22 ¯X =0.28 p =0.005Householdarea(sf)¯X =1,755 ¯X =1,449 p =0.0006Percentagezip0106063%28%p =0.0009 4.5MultipleRegressionModeling Ourgoalwastocreategroupsofhouseswithsimilarcharacteristicsthatinfluencepricetherebytryingtoreconstructthehypotheticalrandomizedcomparison.Anaiveapproachwouldbetosimplycomparehousepricesin2014.Wecanfitasimpleregressionmodel(equivalenttoanequalvariancetwosamplet-test).Table 4 displaystheresultsfromthisnaivemodel. Table4:Unadjustedcomparisonofdifferencein2014pricebydistancegroup EstimateStd.ErrortvaluePr(>|t|)(Intercept)114.01811.5839.840.0000distgroupFartherAway-47.92214.765-3.25 0.0016 Weobservethathomesthatarefartherawaytendtoappreciateabout$4,792lessthanthoseclosertotherailtrail.Thisresultisstatisticallysignificant(p =0.0016).Amajorlimitationofthisapproachisthatitassumesthatthehouseswerecomparablewiththeexceptionofthedistancefromtherailtrail. Wecanfitamorecomplexregressionmodelthatalsocontrolsforbaselineprice.Table 5displaystherevisedresults. Table5:Comparisonofdifferencein2014pricebydistancegroup(adjustedforbaselineprice EstimateStd.ErrortvaluePr(>|t|)(Intercept)80.18828.2512.840.0055adj19980.1470.1121.310.1925distgroupFartherAway-42.75915.230-2.81 0.0060 Weobservethataftercontrollingforbaselineprice,homesthatarefartherawaytendtoappreciateabout$4,276lessthanthoseclosertotherailtrail.Thisresultisstatisticallysignificant(p =0.006)butsomewhatattenuated. 16 JournalofStatisticsEducation,Volume23,Number2,(2015) WefitamorecomplexregressionmodelthatcontrolsforbaselinepriceandotherhousecharacteristicsthatwereobservedtobedifferentbetweenthegroupsinTable3.Table 6displaystherevisedresults. Table6:Comparisonofdifferencein2014pricebydistancegroup(adjustedforbaselinepriceandhousecharacteristics) EstimateStd.ErrortvaluePr(>|t|)(Intercept)14731.3989025.5961.630.1060adj1998-0.5820.161-3.620.0005bedgroup3beds1.14019.0600.060.9524bedgroup4+beds-14.42722.888-0.630.5300garagegroupyes17.56414.4401.220.2269acre-45.74263.249-0.720.4713squarefeet110.93821.8855.070.0000zip-13.8328.510-1.630.1074distgroupFartherAway-21.88614.643-1.49 0.1383 Weobservethataftercontrollingforbaselinepriceandhomecharacteristics,homesthatarefartherawaytendtoappreciateabout$2,189lessthanthoseclosertotherailtrail,butthatthisresultisnotstatisticallysignificant(p =0.14). AlternativeApplication:thepercentagechangeinadjustedpricefrom1998to2014couldbemodeled.Thevariable pctchange hasbeencreatedinthismanner.Table 7 displaystheresultsfromthismodel. Table7:Comparisonofpercentchangeinadjustedpricebydistancegroup(adjustedforbaselineprice) EstimateStd.ErrortvaluePr(>|t|)(Intercept)58.8429.6916.070.0000adj1998-0.0800.044-1.80 0.0746 Regressiondiagnosticsarealwaysimportanttoundertakewheneveramodelisfit.Itisstraightforwardtoassessthenormalityofresidualsfromthislinearmodel(seeFigure 10).Theassumptionofnormalityissuspecthere(withmanyextremeresiduals). 4.6TranslatingtheDatasetfromWidetoTallFormat Whilethedatasetisprovidedintwoformats,allanalysestodatehaveusedthe“wide”formatwithonerowperhouse.Asanexample,let’sconsiderthedatafrom40TrumbullRoad(housenumber97),whichisdisplayedinTable 8. 17 JournalofStatisticsEducation,Volume23,Number2,(2015) >mplot(lm4,which=2) [[1]] Normal Q-Q Theoretical Quantiles St a n d a r d i z e d r e s i d u a l s -2 -1 0 1 2 3 -2 -1 0 1 2 Figure10:Distributionofresidualsforalternativeapplicationmodel. Table8:Listingofasubsetofinformationfor40TrumbullRoadinwideformat streetnostreetnameadj1998adj2007adj2011price2014acre distgroup140TrumbullRoad381.50669.93698.54879.330.26 Closer Someanalysesrequireastructurewithoneobservationpertimeperiod.Wecallthisformat“tall”sincetherewillbefourtimesasmanyrows.Table 9 displaysthedataforthe40Trumbullhomeinthisformat. Table9:Listingofasubsetofinformationfor40TrumbullRoadintallformat housenumstreetnostreetnameyearpriceacre distgroup19740TrumbullRoad1998381.500.26Closer29740TrumbullRoad2007669.930.26Closer39740TrumbullRoad2011698.540.26Closer49740TrumbullRoad2014879.330.26 Closer Havingthedataintallformatfacilitatessomeplots.Forexample,wecandisplayboxplotsofadjustedpriceovertimewhilestratifiedbydistancefromtherailtrailandgroupingofsquarefootage(seeFigure 11).Weobservethatthepriceincreasestendtobelargerforlargerhomes. 18 JournalofStatisticsEducation,Volume23,Number2,(2015) pr i c e ( i n t h o u s a n d s o f a d j u s t e d d o l l a r s ) 200400600800 1998 2007 2011 2014 <= 1500 sfCloser 1998 2007 2011 2014 > 1500 sfCloser <= 1500 sfFarther Away 200400600800> 1500 sfFarther Away Figure11:Distributionofadjustedpriceovertimebydistancefromrailtrailandgroupingofsquarefootage(lessthan1500squarefeetversusgreaterthanorequalto1500squarefeet). 4.7RepeatedMeasuresRegressionModeling Adisadvantageofthepreviousanalysesisthattheydonotincorporatealloftheavail-ablepricedata.Becauseeachhousewasmeasuredatfourdifferenttimepoints,itisnotreasonabletoassumethateachpriceisanindependentmeasurement.Toaccountfortheseclusteredobservations,weusedageneralizedlinearmodelforcorrelateddata(Fitzmauriceetal.2004).Thismodelaccountsforrepeatedmeasuresoneachhousebyestimatinga4 × 4covariancematrix.Ifthemeanmodeliscorrectlyspecified,themodelyieldsunbiasedestimatesofthefixedeffectsparametersandtheirstandarderrors. HelpfulHint:instructorsinintroductorycourses(thatgenerallydonotcoverrepeatedmeasuresmodels)maychoosetoutilizeourearlierapproachwheretheincreaseinpricesovertimewereassessedbypickingtwotimepointsandcontrollingfortheinitialvalue,orjustmodelingthedifferenceinprices,ratherthanuseofthemorecomplicatedrepeatedmeasuresmodel. AlternativeApplication:Arandomeffectsmodelcouldalsobefit(andisillus- 19 JournalofStatisticsEducation,Volume23,Number2,(2015) tratedusingtheRMarkdownfile).Resultsaresimilarusingthisalternativeapproach. Themodelforthemeansalespriceincludesthefollowingvariables:thenumberofbed-rooms,acreage,squarefeetofthehouse,distanceandyear.Weincludedhousecharac-teristicsthatsignificantlydifferedbetweendistancegroupsin2007.Becausewehypoth-esizedthatthedifferenceinpricesbylocationmightvarybyyearweincludedthisinter-actionterm.Theinteractionbetweentimeanddistancewasstatisticallysignificant(F(3,402)=5.32,p =0.0013),sotheinteractionwasretained(seeTable 10). Table10:Comparisonofpricesovertimebydistancegroup(adjustedforhousecharacteristics)usingre-peatedmeasuresregression. EstimateStd.ErrortvaluePr(>|t|)(Intercept)42.515.42.760.006zip01062-12.29.9-1.230.22bedgroup3beds14.211.01.300.20bedgroup4+beds3.513.30.260.79garageYes-4.98.3-0.590.56acre41.136.41.130.26squarefeet102.49.011.4 <0.0001year2007144.68.616.8 <0.0001year201196.08.810.9 <0.0001year2014114.011.69.8 <0.0001distgroupFartherAway-5.99.1-0.650.52Farther*2007-41.710.9-3.810.0002Farther*2011-32.611.3-2.890.004Farther*2014-47.914.8-3.25 0.001 Wenotethatatthefirstobservationtime,therewasnotastatisticallysignificantdifferenceinprices(p =0.52)butthatthepredicteddifferenceinpricesin2014betweenthosecloserawayandfartherawayis$5,900+$47,900=$53,800.Useoftherepeatedmeasureshasallowedustomoreefficientlymodelthesedata. 5.Conclusion Thisdatasetcanbeusedatmanylevelsinthestatisticscurriculum,beginningwithex-ploratorydataanalysisandinformalinference.Furtherextensionsallowuseindatama-nipulation,mapping,andformalinference.Aquestionofinterestiswhetheritispossibletoassesschangesinhousepricesrelativetotheirdistancefromrailtrails,afteraccountingforothermeasuredfactors.Ouranalysisfoundthathomeswithinonehalfmileofawell-establishedrailtrailinNorthampton,Massachusettstendedtoappreciatemore(orretain 20 JournalofStatisticsEducation,Volume23,Number2,(2015) ahighervalue)duringtheperiodfrom1998–2014thanhomesfartherthanonehalfmileawayfromtheoriginal1984trail. Otheranalysesarepossibleusingthisdataset,takingadvantageofitsrelativelyrichchar-acteristics.Theseincludeassessingrelationshipsbetweenhousecharacteristics,sales,andneighborhoodcharacteristics(suchaswalkability). Potentialpitfall:It’salwaysimportanttoensurethatlimitationsofananalysisareclearlyspelledout.Studentsshouldbeencouragedtobrainstormpossibleissues.Astartingpointofpossibilitiesisenumeratedhere:(1)WeconsideredasmallsubsetofhousesoutofthetotalnumberofhousesinNorthampton.Thesewerethehousessoldwithinasinglecommunityinaparticularyear;(2)Theconstructionofadditionalrailtrailsinrecentyearsmeansthatmanyhomesarenowclosertotheexpandingnetworkbutwerecategorizedinthe“fartheraway”categoryinouranalyses;(3)Weusedanarbitrarycodingofdistancefromtherailtrail;(4)DatascrapingusingZillowmayhavenon-negligiblemeasurementerror;(5)Theremayalsobeotherunmeasuredconfoundingfactorsofthesehousesthatcouldpotentiallyaccountfortheseresults. Acknowledgments ThisworkwassupportedbyNSFgrant0920350(PhaseII:BuildingaCommunityaroundModeling,Statistics,Computation,andCalculus).CraigDellaPennaprovidedinvaluableguidanceandassistance,whileMollyJohnsoncontributedtothedatacollectionandanal-ysis.WearealsoappreciativetotheEditor,AssociateEditorandanonymousreviewersforanumberofsuggestionswhichledtoimprovementsinthemanuscript. REFERENCES AmericanTrails2011,“BenefitsofTrailsandGreenways”,Technicalreport.Avail-ableat http://www.americantrails.org/resources/benefits/homebuyers02.html,accessedMarch8,2015.Baumer,B.,C¸etinkayaRundel,M.,Bray,A.,Loi,L.,andHorton,N.2014,“RMark-down:IntegratingaReproducibleAnalysisToolintoIntroductoryStatistics,”Technol-ogyInnovationsinStatisticsEducation 8(1).Availableat http://escholarship.org/uc/item/90b2f5xh,accessedMarch8,2015.CentersforDiseaseControlandPrevention2010,“VitalSigns:State-SpecificObesity 21 JournalofStatisticsEducation,Volume23,Number2,(2015) PrevalenceAmongAdults—UnitedStates”,2009,MorbidityandMortalityWeeklyRe-port.Availableat http://www.cdc.gov/mmwr/preview/mmwrhtml/mm59e0803a1.htm,accessedMarch8,2015.EppleyInstituteforParks&PublicLands2001,“SummaryReport,IndianaTrailsStudy”,Technicalreport.Availableat http://www.in.gov/indot/files/z-CompleteDocument.pdf,accessedMarch8,2015.Fitzmaurice,G.M.,Laird,N.M.,andWare,J.H.2004,AppliedLongitudinalAnalysis,NewYork:Wiley.Horton,N.J.,Baumer,B.andWickham,H.inpress,“SettingtheStageforDataScience:IntegrationofDataManagementSkillsinIntroductoryandSecondCoursesinStatis-tics”,CHANCE 28(2):40–50.http://arxiv.org/abs/1502.00318,lastaccessedMarch8,2015.Kahle,D.andWickham,H.2015,“ggmap:APackageforSpatialVisualizationwithGoogleMapsandOpenStreetMap,”Rpackageversion2.4.Availableat http://CRAN.R-project.org/package=ggmap,lastaccessedMarch8,2015Kaplan,D.2012,StatisticalModeling:AFreshApproach (2ndedition),http://www.mosaic-web.org/go/StatisticalModeling,accessedMarch8,2015.Karedeniz,D.2008,“TheImpactoftheLittleMiamiScenicTrailonSingleFamilyRes-identialPropertyValues”,Technicalreport.Availableat http://www.americantrails.org/resources/economics/littlemiamipropvalue.html,accessedMarch8,2015Lindsey,G.,Man,J.,Payton,S.,andDickson,K.2004,“PropertyValues,RecreationValues,andUrbanGreenways”,JournalofParkandRecreationAdministration,22(3),69–90.Pruim,R.,Kaplan,D.,andHorton,N.J.2015,“mosaic:ProjectMOSAIC(mosaic-web.org)StatisticsandMathematicsTeachingUtilities”.Rpackageversion0.9-2-2.Availableat http://CRAN.R-project.org/package=mosaic,lastaccessedMarch8,2015RCoreTeam2015,“R:ALanguageandEnvironmentforStatisticalComputing”,RFoun-dationforStatisticalComputing,Vienna,Austria.Availableat http://www.R-project.org,lastaccessedMarch8,2015.RailstoTrailsConservancy2015,“Rails-to-TrailsintheMaking,”TechnicalReport.Avail-ableat http://www.railstotrails.org/about/history,accessedMarch8,2015.Utts,J.2005,SeeingThroughStatistics (3rdedition),CengageLearning.Vittinghoff,E.,Glidden,D.,Shiboski,S.,andMcCulloch,C.2012,RegressionMethodsinBiostatistics(2ndedition),NewYork:Springer.Wickham,H.2014,“TidyData,”JournalofStatisticalSoftware 59(10).Availableat http://www.jstatsoft.org/v59/i10/,lastaccessedMarch8,2015. 22 JournalofStatisticsEducation,Volume23,Number2,(2015) Wickham,H.,andFrancois,R.2015,“dplyr:AGrammarofDataManipulation,”Rpackageversion0.4.1.Availableat http://cran.r-project.org/web/packages/dplyr,lastac-cessedMarch8,2015.Xie,Y.2015,“knitr:AGeneral-PurposePackageforDynamicReportGenerationinR,”Rpackageversion1.9.Availableat http://CRAN.R-project.org/package=knitr,accessedMarch8,2015. EllaHartenianDepartmentofBiologicalSciencesSmithCollege,Northampton,MA NicholasJ.HortonDeptofMathematicsandStatisticsAmherstCollegeAC#2239,POBox5000Amherst,MA 01002-5000 Volume23 (2015)|Archive |Index |Data Archive |Resources |EditorialBoard | GuidelinesforAuthors |GuidelinesforDataContributors |GuidelinesforReaders/Data Users | HomePage | Contact JSE |ASA Publications| 23