The board of the Ann Arbor Transportation Authority regularly gets reports of on-time performance of the system. At the latest meeting, board member David Nacht noted that on time performance was "abysmal," with only 83.3 percent of trips reported on time in the July to September time period.

If a system's entire performance is encapsulated into one number, it's hard for management to suggest anything other than "work harder."  Here's a look at tracking system performance from the outside, and some musings on whether going through the FOIA process actually gives you any useful advantages vs. working from publicly available data.


The limitations of FOIA for analysis of complex data sets

Organizations that collect real time data collect a lot of it. With multiple vehicles on each route reporting their location via GPS every minute, you might reasonablly expect millions of individual records to be generated.

Making sense of all of this data can be difficult, even for the organizations that collect it. To produce a report out of all of this, you not only have to know the raw information about all of the locations of all of the vehicles, but also all of the schedule information and all of the detail about which buses are going off their routes back to the depot. A hypothetical FOIA request that requested every location of every bus in the system at all times still would not be enough to tell you why the buses are running late.

FOIA entitles you to copies of existing records, but it does not compel the organization to create a new record on your behalf. If there were an existing report that illustrated the information that you wanted to gather, you could ask for it, and if it had been prepared already you would be entitled to a copy. But if the information you want is hiding in a database and it would take hours of effort to tease it out, no request will compel anyone to do that work for you.


Approximating the analysis from the outside

Fortunately in the case of the AATA, there is a publicly available external source of on-time performance data. The Ridetrak system shows the current state of each of the routes in the system, and the Mobile Ridetrak version of that formatted for mobile phones is relatively uncomplicated to parse.

I collected about three hours of data on on-time performance of five routes in the AATA system this afternoon, and wrote some relatively simplistic code to determine performance for each bus in the collection. All in all, the data I collected reflected the location of each of 1,735 bus times in the system, collected this afternoon between 1:30 p.m. and 4 p.m.

With any data set like this, you worry a little bit about quality. A spot check for sanity showed several routes with on time reports that occasionally did not make sense, e.g. a bus reported to be nine minutes late one minute, on time the next, and nine minutes late again a few minutes later. This did not appear to be a problem that repeated for every route frequently enough to throw broad conclusions off, but it does suggest that errors may creep in that would be cleaned up by a more careful analysis.


Simple conclusions, complicated questions

aata-on-time-thurs.png

A distribution of on-time performance collected for 1,735 bus arrivals during the afternoon shows a range of on time performance. 350 vehicles, representing 20 percent of the sample, were more than five minutes delayed.

Edward Vielmetti | AnnArbor.com

The results of this survey - which, it is to be noted, include some known sample errors - are depicted at right. This afternoon's sample showed about 20 percent of the buses running more than 5 minutes late, with a maximum delay of 19 minutes reported on a bus serving Route 2 (Plymouth Road) at 3:11 p.m.

If you had all of the data in the entire system over the entire reporting period, you could start to answer more complicated questions. Are some times of day worse for on time performance than others? Do some routes perpetually run late? Is there some systematic explanation, like a snow storm, that causes all routes to be late all day long?


Don't start with FOIA first

FOIA is a relatively blunt instrument for requesting detailed system analysis. You may find that it takes a long time and a lot of money to get detailed data that you want, and you might not even be able to understand what you get.

Reports drawn from publicly available data, though incomplete, can suggest a course of analysis more complicated than you are able to answer. By putting together a prototype, you can start to ask questions of people who have access to detailed reporting tools and all of the data which you already have some fraction of the answer for.

Remember, though, that FOIA does not compel anyone to explore the data for you. If you want to answer questions about a system, it can often be most practical to collect the data you need by yourself, and only then go back to the agency with your prototype in hand to say "I did this, can you do better"?

Edward Vielmetti rides the bus for AnnArbor.com. Contact him at edwardvielmetti@annarbor.com.Â