8 Nov, 2024
Qsic was formed in 2012 to provide the best commercial music streaming platform for businesses. Over 12 years the platform has evolved to a multinational partnership with huge brands across hospitality and retail.
Qsic are leaders in Al-driven audio solutions, dedicated to enhancing the retail experience and fostering stronger connections between brands and audiences. Their mission is to drive incremental revenue in-store and to amplify the impact of audio in shaping memorable brand experiences. Through innovation and excellence, they aim to revolutionise the way businesses engage with their customers through audio.
Qsic’s innovative technology and compelling go to market strategy has secured a stunning influx of high-profile local and international clients, including 7-11 in the US. This rapid customer expansion would be bolstered by an uplift in platform observability in partnership with Midnyte City. The observability uplift would give Qsic increased visibility of platform resilience and health, simpler reporting for customer SLA’s, optimised monitoring, logging and alerting and rapid diagnosis and resolution of incidents.
David Goodlad, CTO, engaged Midnyte City to help drive the change and uplift, taking some of the workload off the already busy engineering team.
The engagement included:
An investigation and review of current logging, monitoring and alerting across the platform
Recommendations on appropriate tooling to serve the organisation going forward
Ways to reduce response times and more rapidly resolve incidents
Approaches for improving proactive intervention
This observability uplift is part of Qsic’s commitment to Continuous Improvement, enabling the engineering team to better see and understand platform performance, behaviour and opportunities. This leads to:
Increased productivity
Evolution of ways of working
Greater confidence in deployments
Reduced downtime and context switching
Faster feedback on system performance
Additional time for innovation and feature development through time saved on remediation
The Midnyte City consultant began with gathering information on the current state of observability and worked with the CTO to create an observability road map. After understanding the current state, the decision was made to migrate to a new observability tool rather than building upon existing tools, to better cater to Qsic’s current requirements and future needs.
Single pane of glass
First priority was creating a single pane of glass, so engineers do not have to jump between different tools to gather the information they need to investigate issues. A place where we instrument the different technology stacks used in Qsic to send telemetry data to all the relevant dashboards, and insights could live and be appropriately accessible and tailorable. This would allow monitoring of different elements of the business with the extent of impacts viewed in one place.
Enhanced visibility
Once the foundation for the new observability tool was in place, we enriched the context in the data sent to the observability tool, using structured logs where possible and leveraging a log parser to broaden context in the unstructured logs. This linked traces, from different parts of the system, to form a better view of how data and requests flowed inside the Qsic platform. This prepared the team to better identify and handle unknown failure modes should they arise.
To assist the engineering teams with problem identification and resolution, various dashboards were built to give the team a high level view of platform health and detailed views into different segments of the platform.
Better alerts and playbook collection
With the enriched data in the observability tools, alerts were created with context and the engineering team able to tune these with live events and feedback to reduce alert fatigue.
A collection of playbooks were built with links to alerts to give on-call engineers guidance on where to start any investigation, as well as context of why the alert is in place. This increases an engineer's confidence of going on-call and reduces time to identify and troubleshoot issues when/if they arise.
Service Level Objective (SLO) adoption
To help create a culture of observability at Qsic, Midnyte City ran workshops to help the team understand the benefit of moving towards SLO based alerting and how to leverage SLO’s to make decisions on when to release features and when to focus on reliability.
The workshops helped the team discover potential SLO’s and build out documentation and templates to enable the team to continue the SLO discovery and conversation to drive the buy-in from the business.
Reduced Mean Time To Detect (MTTD)
A meaningful reduction in time from detection to alert to response across a wide group of key systems means faster identification of outages and issues. Coupled with quicker access to the appropriate response, has made it simpler and easier for the engineering team to support the platform.
Easier to discover insight
Greater visibility and accessibility to that visibility coupled with richer context, has led to identification of efficiency opportunities and clearer ways of working. The single pane of glass view has made onboarding and collaboration substantially quicker and simpler. The engineering team is able to leverage these insights to discover issues that were not previously known.
Raised observability maturity
Via documentation and workshops, the collaboration between Midnyte City and Qsic created a lasting cultural impact, moving the engineering team towards observability driven development. The team developed a habit of thinking about measuring success with observability much earlier in the development lifecycle.
Continues improvement
A backlog of work is curated and tailored to changes based around the observability uplift and opportunities for future product development, including more advanced design and product principles on the back of improved visibility. The engineering team are now able to execute further improvement long after the Midnyte City engagement has concluded.
If you would like to speak to someone about similar challenges in your team or organisation, reach out below to schedule a time.