Skip to content

Performance

Transit Parser is designed for speed, using Rust for core parsing operations.

Benchmarks

Tested on a 50MB GTFS feed (NYC MTA-like size):

Operation Transit Parser partridge gtfs-kit
Load feed (eager) 120ms 120ms 850ms
Load feed (lazy) 0.01ms N/A N/A
First access to stop_times 136ms 410ms N/A
Cached access 0.001ms N/A N/A

Key Optimizations

1. Lazy Loading

LazyGtfsFeed defers CSV parsing until you actually need the data:

# Instant - no parsing
feed = LazyGtfsFeed.from_path("gtfs/")

# Still instant - reads metadata only
print(feed.stop_time_count)

# Parses stop_times.txt now
stop_times = feed.stop_times

2. Zero-Copy Parsing

The Rust CSV parser uses zero-copy deserialization where possible, minimizing memory allocations.

3. Index Caching

GtfsFilter builds indexes on first use:

f = GtfsFilter(feed)

# First call builds stop index (~1ms)
stop = f.get_stop("stop_1")

# Subsequent calls use cache (~0.001ms)
stop = f.get_stop("stop_2")

4. Parallel Processing

Batch operations use Rayon for parallel processing:

# Converts files in parallel
result = converter.convert_batch(documents)

Memory Usage

Transit Parser is memory-efficient:

Feed Size Eager Load Lazy Load
10 MB ~100 MB ~1 MB
50 MB ~500 MB ~5 MB
100 MB ~1 GB ~10 MB

Lazy loading only holds the data you've accessed in memory.

Tips for Large Feeds

Use Lazy Loading

# Good - instant load
feed = LazyGtfsFeed.from_path("large_feed/")

# Access only what you need
routes = feed.routes  # Parses routes.txt only

Use Filtering Early

# Good - filter before iteration
f = GtfsFilter(feed)
route_trips = f.trips_for_route("route_1")

# Less efficient - iterates all trips
route_trips = [t for t in feed.trips if t.route_id == "route_1"]

Stream Large Results

For very large result sets, consider chunking:

# Process stop_times in chunks
chunk_size = 10000
stop_times = feed.stop_times

for i in range(0, len(stop_times), chunk_size):
    chunk = stop_times[i:i + chunk_size]
    process(chunk)

Running Benchmarks

The repository includes benchmarks:

cd benchmarks
python run_benchmarks.py

Results are saved to benchmarks/BENCH.md.