Case studies

23 Jul, 2024

Observability while scaling with Chargefox

Chargefox

Chargefox is Australia’s largest and fastest growing Electric Vehicle charging network. Formed in 2017 to help meet the need for better EV charging infrastructure, innovative technology to support it, and software to make it available to everyone. Chargefox is on a mission to make charging simple, affordable and fast for everyone - because simpler charging means more EVs on the road, and that’s a very good thing.

Chargefox makes hosting and managing chargers easy for companies and destinations, and partners with fleet managers, vehicle manufacturers and hire-car providers to keep drivers charged. Chargefox also works with local councils and governments all over Australia to provide EV charging solutions and fleet management services.

Building the platform is central to achieving the organisation's strategic goal of becoming fully integrated into the infotainment and navigation systems of leading vehicle  manufacturers, providing a seamless experience for ever more EV drivers.

The Challenge

Chargefox is scaling at an astronomical rate in order to support the roll out of electric vehicle charging infrastructure across Australia and New Zealand. This challenge resulted in the organisation outgrowing previously suitable monitoring, logging and alerting solutions.

As the organisation scaled, Chargefox CTO, Adrian Cretu-Barbul wanted to improve visibility across the platform to ensure the team could respond efficiently and effectively to opportunities and incidents, deploy new functionality as rapidly as possible and deliver an optimal customer experience.

Logs, metrics and events were tracked by different tools, making it tough for engineers to readily consume. This was an opportunity to increase productivity, and decrease both MTTD (Mean Time To Detect) and MTTR (Mean Time To Recover). The changes would also reduce process friction for engineers, customers and stakeholders.

Simple changes to Chargefox’s deployment process improved the collection and retention of metrics, enabling the team to access a historical record of platform performance. This shift made it simpler for the team to consume metrics and streamlined preparation for on call responsibility. This would make it easier to discover the source of issues when they occured and allow Chargefox to resolve situations smoothly and swiftly.

The Solution

Midnyte City worked with the CTO to understand the opportunities and prioritise the most valuable work. This came down to four key outcomes:

Single pane of glass for monitoring and troubleshooting
Reviewed and enhanced the resource tagging strategy, allowing a view of metrics and query logs in a single view to compare current and historical performance.

Created high level dashboards for the platform gave the engineering team a single place to start any investigation. This would help the engineering team to readily identify trends and patterns.

Application Logging enhancement
Uplifted application logging library to use more structured logs, enriching context in the logs and establish a log attribute directory to standardise logging structure across the organisation. This made discovering insights easier and helped building more meaningful dashboard and alerts.

Alerting enhancement
Off the back of the enriched logs, the creation of alerts on traffic pattern anomalies and issues that impacted a charge stations' ability to service customer demand. We built a collection of playbooks tied to individual alerts in order to guide an on-call engineer on where to start an investigation. This also preserves the context of why the alert exists.

Scaling metrics tuning
Added additional auto scaling metrics to the system to enhance the scaling response to any system load and created a dashboard to observe and validate the performance of the new scaling metrics.

The Results

Faster and easier troubleshooting
With the rich context in logs and the additional supporting dashboards, the team is now able to spot problems and identify the causes of the issues faster, resulting in quicker incident resolution. In some cases this prevented incidents from becoming customer impacting and thus avoided customer complaints. Chargefox is also able to leverage deeper insights to work with other partners in the ecosystem for fault finding.

Increase confidence in deployment
With the operational insight, alerts and high level dashboards, engineers are now able to more effectively and proactively observe system performance and behaviour. This informs decision making, with real time feedback from deployment data resulting in increased systems uptime and fewer customer impacting events.

Handle traffic spikes at scale
Upgraded auto scaling metrics has improved platform response to traffic spikes and seen a reduction in the aligned system message processing latency. The team is now better equipped with a dashboard and documentation to watch and tune the scaling metrics in the future.

Fostering an observability mindset across the team
Sharing dashboards, regular pairing, showcases, lunch and learns and quality documentation allowed the wider Chargefox team to be a part of the observability work. This approach maximises knowledge transfer, skills sharing and fosters an observability mindset across the team so the value of the observability engagement endures long after Midnyte City finished the work and exited the building.

Testimonial

"Midnyte City has been an invaluable strategic partner, showcasing exceptional expertise with a highly skilled team that efficiently streamlines our operations. Their proactive approach consistently delivers robust solutions, enhancing our efficiency and reliability. Their deep understanding and commitment to continuous improvement in our infrastructure sets them apart."

Adrian Cretu-Barbul
Chief Technology Officer, Chargefox

Contact us

If you would like to speak to someone about similar challenges in your team or organisation, reach out below to schedule a time.

*Fields are mandatory

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.