OneBus* – Data reliability on public transportation mobility systems
The issue of data accuracy has resulted in liability concerns regarding the Service Level Agreements. Specifically, there have been problems with the reliability of bus schedules displayed on the devices at the bus stops, including missing updates on estimated arrival times and failure to reflect changes to bus services such as route changes and cancellations.
Additionally, schedule data is unavailable for certain bus stops via the public API, and there is a lack of information on bus load displayed at the bus stop device. Furthermore, the event publisher fails to provide notifications about bus positioning, and inaccurate forecasts have been made regarding the transit system load.
Our Challenge
Our challenge has been managing a growingly complex tech stack that includes a multi-cloud system spanning Azure, AWS, and GCP, multiple data structures, and diverse data ingestion technologies such as ETL tools, Azure functions, and MQTT services.
More, the data engineering team was downstream from the analytical and operational teams, and there was a cloud-to-device management system in place, along with a public web API, a web portal, and a mobile app. Real-time event notifications had been facilitated by an event publisher, and the system supports reporting and analytics. However, the pipelines execution scheduler created strong data dependencies, and the team was forced to use auto-scaling to mitigate performance issues.
The client chosen our solution, the FLUENT Data Observability Platform, because it provides the ability to monitor the health of operations through the addition of metadata to all ingestion pipelines.
The platform can track important metrics such as the duration of GTFS daily updates, delays, retries, and ETL execution status, as well as GTFS-RT system connectivity. It can also monitor dataset availability and schema changes for GTFS and GTFS-RT, and track the number of records for GTFS daily updates and GTFS-RT message updates.
Fluent is equipped with data profiling and anomaly detection capabilities, which enable it to detect abnormal schedule changes, such as bus stop arriving time changes and bus line changes for bus stops, as well as spikes in estimated passenger numbers. Additionally, the platform implements data validation by enforcing business rules, such as GTFS-RT validation based on GTFS trip unique identifiers and epoch time validation for next bus arrival.
FLUENT Data Observability Platform
The Outcome
Using the FLUENT Data Observability Platform from Vortex AI has been very positive for OneBus. OneBus has set up single-sign-on access to the monitoring platform, allowing anyone with a company email to access it. This has helped democratize access to data tooling, which has led to increased transparency and focus on data quality.
With monitoring, alerting, and lineage in place, the engineering teamwas able to proactively communicate data downtime to data consumer teams and mitigate the impact on the system. Additionally, by implementing a system where reports and analysis are not available until data is fixed, OneBus has ensured that data quality remains a top priority.
Investing in data observability has helped OneBus accomplish its larger mission of keeping passengers informed and up-to-date on public transportation services status. The ETA calculation based on GTFS schedule has helped ensure that passengers can plan their trips with accuracy.
* OneBus is a fictive name to protect the confidentiality of our client.