Empirical Delay
Contact Info
- Entur AS
Documentation
The empirical delay module enables import of historical delay data which OTP can use to provide typical delay information based on actual observed service delays. Empirical delay data is loaded during the graph build process and provides percentile-based delay estimates (p50 and p90) for service journeys at specific stops.
This feature helps users understand the typical reliability of a service based on historical data, allowing them to make more informed travel decisions. For example, a service that historically arrives 5 minutes late 50% of the time (p50 = 5 minutes) gives travelers a more realistic expectation than just the scheduled time.
Empirical Delay Data Files
Empirical delay data is provided through two CSV files that must be supplied as standalone files
listed in the build-config.json
. Both files must be provided together:
empirical_delay_stop_times.txt
- Contains delay percentiles for each trip stopempirical_delay_calendar.txt
- Defines service calendars for the delay data
The file names must include "empirical_delay" and the extension must be .csv
or .txt
. Both
NeTEx and GTFS feeds are supported through the feedId configuration.
Empirical Delay Stop Times Format
This file contains the delay data for each stop on each trip, organized by service calendar periods (e.g., weekdays, weekends).
CSV file columns:
CSV Header | Description |
---|---|
empirical_delay_service_id |
The service calendar identifier that determines when this delay data applies (references calendar file). |
trip_id |
The GTFS trip id or NeTEx ServiceJourney id. |
stop_id |
The GTFS stop id or NeTEx Quay id for this stop. |
stop_sequence |
The stop sequence number in the trip pattern. First stop is 1, not 0. |
p50 |
The median (50th percentile) delay in seconds. Half of historical observations are better than this. |
p90 |
The 90th percentile delay in seconds. 90% of observations are better, 10% are worse. |
Constraints: - Percentile values (p50, p90) must be between 0 and 18000 seconds (5 hours) - Stop sequence must be between 0 and 10,000 - If the stop_id does not match the transit feed, the row is dropped and an issue is added to the build report
Example:
empirical_delay_service_id,trip_id,stop_id,stop_sequence,p50,p90
Friday,RUT:ServiceJourney:1,NSR:Quay:1001,1,0,30
Friday,RUT:ServiceJourney:1,NSR:Quay:1002,2,120,240
Friday,RUT:ServiceJourney:1,NSR:Quay:1003,3,180,360
Saturday,RUT:ServiceJourney:1,NSR:Quay:1001,1,30,60
Saturday,RUT:ServiceJourney:1,NSR:Quay:1002,2,45,120
In this example: - On Fridays, stop 2 has a median delay of 2 minutes (120s) and is delayed by 4+ minutes 10% of the time - On Saturdays, the same stop has better performance with only 45 seconds median delay
Empirical Delay Calendar Format
This file defines service calendars that determine when specific delay patterns apply, similar to GTFS calendar.txt format.
CSV file columns:
CSV Header | Description |
---|---|
empirical_delay_service_id |
Unique identifier for this service calendar. |
monday |
Boolean (0 or 1) - whether this calendar applies on Mondays. |
tuesday |
Boolean (0 or 1) - whether this calendar applies on Tuesdays. |
wednesday |
Boolean (0 or 1) - whether this calendar applies on Wednesdays. |
thursday |
Boolean (0 or 1) - whether this calendar applies on Thursdays. |
friday |
Boolean (0 or 1) - whether this calendar applies on Fridays. |
saturday |
Boolean (0 or 1) - whether this calendar applies on Saturdays. |
sunday |
Boolean (0 or 1) - whether this calendar applies on Sundays. |
start_date |
Start date in YYYY-MM-DD format (inclusive). |
end_date |
End date in YYYY-MM-DD format (inclusive). |
Example:
empirical_delay_service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
Weekday,1,1,1,1,1,0,0,2025-01-01,2030-12-31
Weekend,0,0,0,0,0,1,1,2025-01-01,2030-12-31
Friday,0,0,0,0,1,0,0,2025-01-01,2030-12-31
Configuration
To enable this functionality, you need to enable the "EmpiricalDelay" feature in the
otp-config.json
file:
Include the empiricalDelay
object in the build-config.json
file with a list of feeds containing
empirical delay data:
// build-config.json
{
"empiricalDelay": {
"feeds": [
{
"feedId": "RUT",
"source": "https://example.org/empirical-delay/rut-delays.zip"
},
{
"feedId": "NSB",
"source": "file:///data/empirical-delay/nsb-delays.zip"
}
]
}
}
Each feed configuration requires:
- feedId
(required): The feed ID to use for matching transit IDs in the empirical delay data.
This must match the feed ID of the corresponding GTFS or NeTEx feed.
- source
(required): URI pointing to the empirical delay data. Can be a local file, HTTP(S) URL,
or cloud storage URI (e.g., gs://).
The source can be:
- A ZIP file containing both empirical_delay_stop_times.txt
and empirical_delay_calendar.txt
- A directory containing these files
- Individual files (both must be provided)
GraphQL API
The empirical delay data is exposed through the Transmodel GraphQL API on the EstimatedCall
type:
type EstimatedCall {
# ... other fields ...
"The typical delay for this trip on this day for this stop based on historical data."
empiricalDelay: EmpiricalDelay
}
type EmpiricalDelay {
"The median/50% percentile. This value is in the middle of the distribution."
p50: Duration
"The 90% percentile. 90% of the values in the distribution is better and 10% is more delayed."
p90: Duration
}
Example query:
{
stopPlace(id: "NSR:StopPlace:1") {
estimatedCalls(numberOfDepartures: 5) {
expectedDepartureTime
destinationDisplay {
frontText
}
empiricalDelay {
p50
p90
}
}
}
}
If no empirical delay data is available for the specific combination, the field returns null
.
Use Cases
Traveler Information
Empirical delay data can be displayed to travelers to set realistic expectations: - "This service is typically 2-3 minutes late at this stop" - "90% of the time, this bus arrives within 5 minutes of schedule"
Journey Planning
Journey planners can use empirical delay to: - Add realistic buffer times when suggesting transfers - Rank alternatives based on historical reliability - Adjust arrival time predictions beyond real-time data
Service Quality Monitoring
Transit operators can: - Identify consistently delayed services - Compare different time periods (weekday vs weekend) - Track improvement over time as new delay data is imported
Changelog
OTP 2.9
- Initial implementation of empirical delay with percentile-based delay data (p50, p90), service calendar support for day-of-week patterns, and available in the Transmodel GraphQL API.