Skip to content

Schedule Validation & Generation

Transit Parser includes a powerful schedule validation and generation module for working with operational transit schedules. This guide covers how to validate schedules against GTFS data, infer missing deadheads, and export schedules in various formats.

What is a Schedule?

A schedule (also called a "run cut" or "blocking") is an operational plan that assigns:

  • Trips to blocks (vehicles)
  • Blocks to runs (drivers)
  • Times and locations for all movements

Schedules typically come from scheduling software like Optibus, Hastus, or GIRO and contain both:

  • Revenue trips - Passenger-carrying trips that reference GTFS trip_ids
  • Non-revenue movements - Deadheads (pull-out, pull-in, interlining), breaks, reliefs

Loading a Schedule

From CSV File

from transit_parser import Schedule

# Load with automatic column detection
schedule = Schedule.from_csv("schedule.csv")

print(f"Rows: {len(schedule)}")
print(f"Blocks: {schedule.block_ids()}")
print(f"Trips: {schedule.trip_ids()}")

From CSV String

csv_data = """block,trip_id,start_time,end_time,start_place,end_place
B001,T001,08:00:00,08:30:00,STOP_A,STOP_B
B001,T002,08:35:00,09:05:00,STOP_B,STOP_C
B002,T003,08:15:00,08:45:00,STOP_C,STOP_D"""

schedule = Schedule.from_csv_string(csv_data)

With Custom Column Mapping

If your CSV uses non-standard column names, provide a mapping:

schedule = Schedule.from_csv_with_mapping(
    "schedule.csv",
    column_mapping={
        "block": "vehicle_id",
        "trip_id": "journey_ref",
        "start_time": "depart",
        "end_time": "arrive",
        "start_place": "origin",
        "end_place": "destination",
    }
)

Schedule Summary

Get a quick overview of your schedule:

summary = schedule.summary()
print(summary)
# {
#     'total_rows': 150,
#     'revenue_trips': 120,
#     'deadheads': 25,
#     'breaks_and_reliefs': 5,
#     'unique_blocks': 10,
#     'unique_runs': 8,
#     'unique_depots': 2
# }

Validating Schedules

Against GTFS Data

The most powerful validation checks that schedule references match GTFS entities:

from transit_parser import GtfsFeed, Schedule, ValidationConfig

# Load both
gtfs = GtfsFeed.from_path("gtfs/")
schedule = Schedule.from_csv("schedule.csv")

# Validate with default settings
result = schedule.validate(gtfs)

if result.is_valid:
    print("Schedule is valid!")
else:
    print(f"Found {result.error_count} errors:")
    for error in result.errors:
        print(f"  [{error['code']}] {error['message']}")

Validation Levels

Configure how strict validation should be:

# Strict - All trip_ids must exist in GTFS
config = ValidationConfig.strict()

# Standard - Some missing references allowed with warnings
config = ValidationConfig(gtfs_compliance="standard")

# Lenient - Only check structure, skip GTFS references
config = ValidationConfig.lenient()

result = schedule.validate(gtfs, config)

Custom Business Rules

Configure business rules for your operation:

config = ValidationConfig(
    gtfs_compliance="standard",

    # Minimum layover between trips (default: 5 minutes)
    min_layover_seconds=300,

    # Maximum single trip duration (default: 4 hours)
    max_trip_duration_seconds=14400,

    # Maximum duty length (default: 9 hours)
    max_duty_length_seconds=32400,

    # Maximum continuous driving before required break (default: 4.5 hours)
    max_continuous_driving_seconds=16200,

    # Minimum break duration (default: 30 minutes)
    min_break_duration_seconds=1800,

    # Allowed deviation from GTFS times (default: 60 seconds)
    time_tolerance_seconds=60,
)

result = schedule.validate(gtfs, config)

Structure-Only Validation

Validate schedule structure without GTFS:

# Check block continuity, time ordering, etc.
result = schedule.validate_structure()

for warning in result.warnings:
    print(f"Warning: {warning['message']}")

Validation Result

The result contains detailed error and warning information:

result = schedule.validate(gtfs, config)

print(f"Valid: {result.is_valid}")
print(f"Errors: {result.error_count}")
print(f"Warnings: {result.warning_count}")
print(f"Rows validated: {result.rows_validated}")
print(f"Blocks validated: {result.blocks_validated}")

# Errors are critical issues
for error in result.errors:
    print(f"ERROR [{error['code']}] {error['category']}: {error['message']}")
    if error['context']:
        print(f"  Context: {error['context']}")

# Warnings are best-practice suggestions
for warning in result.warnings:
    print(f"WARN [{warning['code']}]: {warning['message']}")

Inferring Deadheads

Many schedules only contain revenue trips. The deadhead inferrer can generate missing:

  • Pull-outs - From depot to first trip start
  • Pull-ins - From last trip end to depot
  • Interlinings - Between non-continuous trips in a block
# Infer deadheads
result = schedule.infer_deadheads(
    gtfs=gtfs,  # Optional: for stop coordinates
    default_depot="MAIN_DEPOT"  # Depot code for pull-out/pull-in
)

print(f"Inferred {result.total_count} deadheads:")
print(f"  Pull-outs: {result.pull_out_count}")
print(f"  Pull-ins: {result.pull_in_count}")
print(f"  Interlinings: {result.interlining_count}")

# Check for blocks that couldn't be completed
if result.incomplete_blocks:
    print(f"Could not infer for blocks: {result.incomplete_blocks}")

Exporting Schedules

To CSV File

# Export with default columns
schedule.to_csv("output.csv")

# Export specific columns
schedule.to_csv("output.csv", columns=[
    "run_number", "block", "trip_id", "start_time", "end_time"
])

Using Presets

Presets provide column configurations that approximate common formats:

# Optibus-like format
schedule.to_csv("optibus_schedule.csv", preset="optibus")

# Hastus-like format
schedule.to_csv("hastus_schedule.csv", preset="hastus")

# Minimal (just essential columns)
schedule.to_csv("minimal.csv", preset="minimal")

# Extended (all available columns)
schedule.to_csv("full.csv", preset="extended")

# GTFS blocks.txt compatible
schedule.to_csv("blocks.csv", preset="gtfs_block")

To String

csv_string = schedule.to_csv_string(preset="minimal")
print(csv_string)

Working with Schedule Rows

Access individual rows for detailed analysis:

for row in schedule.rows:
    print(f"Block {row.block}: {row.start_time} - {row.end_time}")

    if row.is_revenue():
        print(f"  Revenue trip: {row.trip_id}")
    elif row.is_deadhead():
        print(f"  Deadhead: {row.start_place}{row.end_place}")

    # Duration in seconds
    if row.duration_seconds():
        minutes = row.duration_seconds() // 60
        print(f"  Duration: {minutes} minutes")

Row Properties

Each ScheduleRow provides:

Property Type Description
block str \| None Block (vehicle) identifier
run_number str \| None Run (driver) identifier
trip_id str \| None GTFS trip_id (revenue trips only)
start_place str \| None Origin stop_id or location
end_place str \| None Destination stop_id or location
start_time str \| None Departure time (HH:MM:SS)
end_time str \| None Arrival time (HH:MM:SS)
depot str \| None Depot code
vehicle_class str \| None Vehicle class/category
vehicle_type str \| None Specific vehicle type
row_type str Type: revenue, pull_out, pull_in, deadhead, break, relief
duty_id str \| None Duty identifier (rostering)
shift_id str \| None Shift identifier (rostering)

Row Methods

Method Returns Description
is_revenue() bool True if this is a revenue trip
is_deadhead() bool True if this is any deadhead movement
duration_seconds() int \| None Duration in seconds

CSV Column Support

The schedule reader auto-detects these column names (case-insensitive):

Field Accepted Column Names
block block, block_id, vehicle_block, veh_block
run_number run, run_number, run_id, driver_run
trip_id trip, trip_id, journey_id, journey_ref
start_place start_place, origin, from, start_stop, from_stop
end_place end_place, destination, to, end_stop, to_stop
start_time start_time, depart, departure, start
end_time end_time, arrive, arrival, end
depot depot, garage, depot_code, garage_code
vehicle_class vehicle_class, veh_class, class
vehicle_type vehicle_type, veh_type, type
row_type row_type, type, activity_type

Example: Complete Workflow

from transit_parser import GtfsFeed, Schedule, ValidationConfig

# 1. Load data
gtfs = GtfsFeed.from_path("gtfs/")
schedule = Schedule.from_csv("raw_schedule.csv")

# 2. Get overview
print(f"Schedule has {len(schedule)} rows")
print(f"Summary: {schedule.summary()}")

# 3. Validate against GTFS
config = ValidationConfig(
    gtfs_compliance="standard",
    min_layover_seconds=180,  # 3 minute minimum layover
)
result = schedule.validate(gtfs, config)

if not result.is_valid:
    print("Validation failed!")
    for error in result.errors:
        print(f"  {error['message']}")
    # Handle errors...

# 4. Infer missing deadheads
inference = schedule.infer_deadheads(gtfs, default_depot="CENTRAL")
print(f"Added {inference.total_count} deadheads")

# 5. Export complete schedule
schedule.to_csv("complete_schedule.csv", preset="extended")
print("Done!")