MediaCom is a global media agency based in the US though this project was ordered through their UK division and later utilised by Google to implement as part of their marketing analytics tools.
We were building a visualisation project that had the potential to use over 50 different data sources per page. Now we could have built the querying out per data source and loaded the data individually, but that would mean a lot more code maintenance and hours spent adding new “Querying Engines” if and when they’re required.
This is where the idea of a Single Point Multi Data Source Query Engine (Query Engine for short) was born. The plan was to make it so that another dev would just need to configure the data sources that they want within the database and then they could be used on the front end by using its unique name.
This approach would mean that adding a new data source is as easy as adding a single class to the system, making it more testable and far more simple to maintain. It also added the ability to “Cross-Pollinate” data sources because of its modular nature. If you wanted to build a system where a calculation needed to be done from data that is stored in Postgres, that then filtered Big Query and combined the 2 for visualisation, it was possible
The approach could have been a complex line of if statements or a system that builds the instance of the class from a string. But I found that since we were using Django, the Django models should suffice in what we wanted to achieve. So I went forth and discovered the depths of the model system and how it works. This is where I found out that basic inheritance would be the best and most efficient way to build the system.
So here’s what we needed specifically to meet the requirements, we need a single place that we could put in “Query Names” that would then spit out objects that could then get the data for us from any data source. So this means that we are going to need a model that holds the basics of the query, such as the name and a link to the data source. Then we needed a way to link that model to another one that has the required information and logic to connect to the data source.
Starting with the models, using the single “Query Model” we can link the specific “Engine Models” to it using basic Django relation. This meant that when a person searched for a name they simply had to call the name of the engine they wanted that was related to the Query (eg. Query.Postgres). But that was a bit less intuitive and still meant that there would be an if statement somewhere. So to get around that, store the name of “Engine Model” being referenced in the “Query Model” and use that to get the right instance.
Finally because of this structure retrieving queries is incredibly simple. I built a single function called get_queries that takes a list of names as an argument, and then filters the Query model for those names and returns them as the queryset.
The Consequence
Because of using this kind of simplistic architecture, there were no problems when it came to maintaining and upgrading different engines or data sources. If one of the engines wasn’t working it was a lot easier to fix because it wasn’t mixed in with any others.
Though this wasn't the first project that I got to really stretch my architectural muscle, it was the first where I got to show off my leadership skills in a far more complex and technical environment.
Postgresql
Django
BigQuery
OpenSearch