20 Feb, 2023
Founded in Melbourne, EstimateOne is a tender management platform connecting commercial builders and subcontractors in the commercial construction industry across Australia, New Zealand and the UK. Placing 3rd in AFR’s Best Places to Work list in 2022 has established them as one of the preeminent local technology organisations, thanks to a progressive vision and dynamic culture.
EstimateOne’s innovative cloud-based platform brings builders, suppliers, and sub-contractors together to facilitate collaboration on commercial construction projects. It enables:
Tradespeople to find and manage upcoming work opportunities specific to their trade
Builders to distribute and oversee construction drawings and documents
Suppliers to search for their products and easily follow projects where they have been specified
The platform is specifically designed to reduce risk and increase productivity across the industry. Last year alone, over 94,000 organisations used the EstimateOne platform to tender over 14,000 projects worth a total of $137 billion.
The nature of the platform makes Data and Analytics a significant strategic competitive advantage for EstimateOne. The Midnyte City crew were very excited to be part of the team addressing secure and scalable Data Management by building out a Data Mesh.
The Data Mesh aims to support better decision making by generating insights that can be served on the platform or outside it. It is supported by a flexible infrastructure that enables rapid new market entry and can be adapted to accommodate local data sovereignty and privacy constraints.
Governance was a key component of the work. It is critical to meet compliance obligations to ensure operational continuity and safeguard data from theft, loss or corruption.
Traditional Data Lakes rely upon a team that sits in between the producers of data and the consumers of data. This team services requests from the consumers, collaborating with the producers and doing the work of sourcing/providing system data into the lake.
The goal of a Data Mesh is less about the technology involved and more about data ownership and the collaboration between the producer and consumer. The Mesh approach pushes the data stewardship and data custodianship back to the producing team who understand the data in their domain and are best placed to perform these roles. A Mesh approach alleviates the need for a data team sitting between producers and consumers servicing requests. Thus was determined to be the more sustainable, scalable, secure, robust and flexible solution for EstimateOne.
Organisational Data is often spread across multiple systems and data stores, such as:
Tables from relational databases including database instances and often heterogeneous DB engines
Application events
Telemetry data from streaming sources
Text files
EstimateOne wanted consumers like the Data and Analytics team and the Data Science group to be able to write queries to quickly and securely interrogate the entire organisational Data set. The first task was getting all this Data into one place where it is securely accessible to applicable users.
The resulting Data Mesh targets various heterogeneous data sources. This is made up of three key patterns, Producer, Governance, and Consumer:
Producer is about getting the data into a presentable form. Raw data is massaged in a data pipeline to perform functions like casting of data types and de-identification of PII. There are several S3 buckets for data storage, in this instance ingestion, curation and presentation.
Governance is about securely publishing the available data to a central data catalogue. This central catalogue is available to all consumers to peruse and discover available databases and entities.
Consumer - Once the data catalogue has been interrogated, the consumer teams can request permissions from the respective producers to be granted access to the underlying data. At this point they can query the data with AWS tools like Athena and QuickSight.
Using AWS DMS the team were able to achieve a quick win to get system aligned data out of existing RDS database instances and into the Data Mesh. The initial approach was to populate the data mesh with daily snapshots, with a view to turn on Change Data Capture (CDC) at a later date. The use case for the consumer in this instance did not rely on near real time data.
For clients with a need for near real time data availability in the Data Mesh, a service such as AWS EMR can be used to “stitch together” full load data and CDC.
Database full-load and ongoing changes captured by DMS
DMS tasks push raw data to ingestion bucket
On completion of DMS task, data pipeline triggered and crawler runs
Crawler populates Glue data catalog
Crawler triggers curation job
Curation job reads ingested data and casts column datatypes
Curation job pushes curated data to curation bucket
Curation job populates Glue data catalog
Curation job triggers presentation job
Presentation job reads curated data and de-identifies PII data
Presentation job pushes de-identified data to presentation bucket
Presentation job populates Glue data catalog
EstimateOne’s Data Platform Strategy aims to shift thinking around data availability from a data at rest architecture to a data in motion architecture by publishing microservice domain events as they happen to a common eventing streaming platform. Potential domain consumers then subscribe to these streams and process these events as they happen.
EstimateOne emits domain events in JSON format, which are then passed into the data mesh where the data is then processed and securely stored for presentation. The resulting presented data can then be easily queried by consumers that have been granted access to the data.
Application publishes events to an SNS topic
Event is captured and delivered to a Raw ingestion bucket and stored unmodified
S3 object creation triggers event processing
Raw events are delivered via firehose ready to be processed
Event is transformed/flattened using Lambda Custom Transformations and delivered to a presentation bucket
SNS notification is generated on object creation to trigger final phase
On event, Presentation bucket is crawled by AWS Glue, and Data Catalog is updated if required
In order to provide EstimateOne teams with the access to the required data while implementing controls around data access AWS Lake Formation allowed for the granular access. It provides the ability for producers of data to maintain ownership of the data while simplifying the cross account/team access to the data.
Data Producer publishes to the central catalog via AWS Lake Formation
Data Consumer identifies required data sources and requests access via the producer
Data is shared to the consumer domain using Lake Formation permissions
User can query the data using native AWS services such as Athena or supported third-party tooling
The resulting solution is a modern, scalable, leading-edge Mesh infrastructure for a global enterprise. The Mesh is a modern approach to Data Ownership, with no central data team handling requirements back and forth between producers and consumers. Data Management is lean as the Data remains the property of the producer.
The Data Mesh generates valuable insights for key stakeholder groups including the Data Science and Data and Analytics teams. Crucially, this flexible infrastructure is a significant competitive advantage for EstimateOne, enabling rapid new market entry, with the Mesh readily adapted to accommodate local data sovereignty and privacy constraints.
The Mesh has improved Data Security and governance right across the organisation, enabling EstimateOne to meet compliance obligations, ensure operational continuity and safeguarding Data from theft, loss or corruption. Addressing an issue of grave concern to many Boards and Executive teams.
“I highly recommend the Midnyte team for any Data Mesh implementation project. Their expertise and dedication were crucial to the success of our Data Mesh implementation at EstimateOne.
From the outset, the Midnyte team took the time to fully understand our business needs and goals, and developed a comprehensive plan to address them. They provided valuable guidance and support throughout the process, ensuring that we were on track to meet our deadlines and targets.
The team's deep knowledge of Data Mesh best practices and their ability to adapt to our unique business context made them a valuable asset to our project. Their ability to effectively communicate and collaborate with our team made the process smooth and efficient.
Overall, the Midnyte team's contribution to our Data Mesh implementation was invaluable, and we are extremely satisfied with the end result. We highly recommend them for any Data Mesh project.
Above all else, they were a pleasure to work with and I look forward to working with them again in the future ”
David Parlevliet
Engineering Manager of the Infrastructure Team at EstimateOne
If you would like to speak to someone about similar challenges in your team or organisation, reach out below to schedule a time.