At Data Minded, we are data engineers first and foremost. But in reality, we do a lot of consulting. What is consulting really? I recently gave an internal talk about Consultancy at Data Minded and I thought it would make sense to share my thoughts publicly. I’m curious about your feedback. Leave it in the comments!

Why does the world need consulting in the first place?

Consultants get a lot of criticism for charging a lot of money and not really making a big difference. And that’s not a complete lie in some cases. :-)

But then, why does the world need consultants? Well, it’s actually really simple.

The world of data is moving and shaking again. Ever since Hadoop came around, people were offloading workloads from their data warehouses to the new and shiny data lakes. And it didn’t take long before Spark, which was open sourced in 2010, became the standard processing engine on data lakes.

Now we see a reverse trend, back to the data warehouse. And with that trend, DBT has risen as almost the de-facto standard for doing transformations on modern cloud-native data warehouses. …

Last year I blogged about how I got my AWS Pro Architect certificate, which you can read about here. Now it was time to get the Data Analytics Speciality. I won’t repeat all the same things that I wrote in the AWS Pro blogpost. That would be a bit redundant. Here I’d like to focus on the things I learned for this Data Analytics Speciality.

All cloud exams are alike

I’ve been doing quite some certifications on AWS and Azure, and I am now studying for a GCP certification. There is a pattern. That’s also why, if you are able to pass one, it’s much…

Now that the Privacy Shield has been invalidated, there are some legal disputes whether European companies and government agencies can still store sensitive data on the big 3 cloud providers (AWS, Azure and GCP). They are owned by American companies, and as such, there are no guarantees that the US government won’t force these companies to give them access to that data.

Based on this, a recommendation was published by the Flemish Oversight Committee, basically advising against using AWS for storing certain sensitive education data. It’s in Dutch. …

“It works on my machine”. That’s great. But now how do you make sure it runs in production, repeatedly and reliably? Here we share our lessons learned from deploying many analytics solutions at clients.

By the way, what do we mean with “Analytics workloads”? It’s any workload where you send data to an algorithm and you create some kind of insight. That can be an ML algorithm, that can be a data cleaning job, data integration, NLP processing, … Any piece of the data pipeline really.

Deployments are the most important thing you do

A typical development lifecycle looks like this:

Your organisation wants to dive head-first into data and AI but you don’t really know where to start? Data&AI is on the radar of most C-level leaders. It’s often seen as a differentiator from competition, as an enabler for more customer engagement, as a tool for cost reduction. And there is no lack of industry analysts and strategic reports that highlight the impact that data already has on the corporate world. But how to get started? We’ve done quite a few tours of duty in data analytics. …

Is it worth it to spend time and money collecting advanced certifications? In this blog I share my opinion and lessons learned. TL/DR: Yes, you will become quite knowledgeable about the details of AWS, and yes it will look good on your CV. No, it won’t make you a great architect. If you want to know how I studied, scroll all the way down.

What is the fuzz about?

One does not simply pass the AWS Solution Architect Pro exam. You can’t just watch some videos for 2 days and ace the test. You actually have to study. In the magical land of certifications, the…

Raw clickstream data is a valuable data source in almost any analytics project. But it’s not always easy to capture. Free tools like Google Analytics often don’t expose raw clicks. And enterprise tools like Google Analytics premium or Adobe Analytics can be expensive, and in large companies often require a procurement process.

We discovered a nice feature on Azure where you can get these raw clickstreams in near-real-time, at an affordable price. It is part of a service called Application Insights, which can be used for much more than just website click data. But that is out-of-scope for this blog…

This is another one of those “how to” blogs that can hopefully help people get up-and-running quickly because it took me a while to figure it out. Here is the full code if you want to try this yourself: Obviously, this is all just demo code, with a lot of hardcoded strings and shortcuts. This is not production-grade. So take what you can use, and adapt it to your context.

Situation: Give me cheap Spark data pipelines!

You want to run spark jobs on Azure. But cost is a concern. So you’re looking into Azure Batch because you can use Low-Prio VMs, which are not available…

Imagine you start a new job in the kitchen of a fancy restaurant. You just graduated from your culinary studies and you’re eager to learn all about working in the fine dining industry.

But, oh boy, it’s not like you imagined. Everybody in the kitchen is under constant pressure to deliver as many dishes as possible to customers. And while you and your team manage to keep up in the beginning of the evening, soon more and more orders keep on piling in, and you find yourself preparing food for 4 tables at the same time. Even worse, whenever you…

Kris Peeters

Data geek at heart. Founder and CEO of Data Minded.

