Blog

Data Engineering: the Boiler Room of Artificial Intelligence

By Ben Johnston   | 
Read Post
Filter by Category
Filter by Category

When considering the current state of Artificial Intelligence (AI) and Machine Learning (ML) technologies, it certainly resembles something of a golden period of development. Modern AI technologies are capable of extraordinary achievements, such as simultaneously locating and identifying hundreds of different findings within an x-ray of a human chest[1], enabling self-driving cars to navigate complex situations and automatically completing sentences of blog posts. While being somewhat cliché, many parallels can be drawn between the impact of modern AI on industry and society and the steam engines of the industrial revolution.

The power of coal and steam unlocked several different technologies including lower-cost mass manufacturing that brought previously inaccessible products and services to those who could not otherwise afford them At the time, steam-powered railways were looked upon as a marvel of modern technology, carrying many people and goods over great distances at speeds never before seen.

Today, AI technologies are looked upon with the same sense of marvel and wonder and have possibly even more potential to unlock access to a variety of products and services. At harrison.ai, this is one aspect of AI that drives us. We strive to combine our clinical and engineering expertise to create products that deliver the same quality of care as a team of experienced clinicians at a scale that enables access for everyone across the globe.

Interestingly, the analogy with the industrial revolution does not stop here. As previously mentioned, while both steam engines and AI models were and are looked upon as technological wonders, both technologies are built upon critical, though often less romantic, pieces of infrastructure. Behind each steam engine was a boiler, operated by soot-covered firemen whose role was to ensure the fire was continually delivered with coal to maintain heat and power.

Similarly, behind each AI engineering team is a team of Data Engineers, whose role is to ensure high-quality datasets are curated, catalogued, and made available for model training and testing. While not always the most glamorous (though we certainly disagree with that assertion!), the role of the data engineer is critical, rewarding and certainly challenging. Without large, high-quality datasets it simply would not be possible to produce capable and performant AI models; conversely, adding new data to a dataset can improve the performance of an AI model without changing the architecture of the model itself! In order to ensure the supply of data, Data Engineers are trained in and employ a combination of sound software engineering and dev-ops / infrastructure design principles. At harrison.ai we are frequently working with datasets in the order of petabytes (Pb) in size, and this itself brings several interesting and rewarding challenges from an engineering, infrastructure and information theory perspective. At this scale, managing and processing the data in a timely yet not costly fashion is critical, with minor changes or processing errors having a significant overall impact on either the availability of data or cost of processing. There have been many occasions where, while trying to increase processing speed, we have broken the services of the large public cloud providers. These emergency calls to stop processing are badges that we wear with honour!

The harrison.ai data engineering team is frequently looking for software engineers, dev-ops engineers and solutions architects to join our team. If you are interested in a role at the firebox of modern AI technologies, do not hesitate to get in touch!

 Ben Johnston is the Head of Data Engineering at Harrison.ai

 [1] https://annalise.ai/solutions/annalise-cxr/