Data Architecture: Measure Twice, Cut Once

Architecture is an underrated skill in the Data Engineering profession. With the rise of Agile came a lot of bad practices.

I have heard someone make a declaratory statement "Waterfall is inefficient", the person making the statement obviously was not speaking from personal experience, but regurgitating something they did not know firsthand.

"Waterfall" is a methodology that become popular as a formal construct, as far as I know, in around the 90s or 2000s. When I was a young developer I discovered "Waterfall" and really it just means the following: Define what you want (requirements), Do a design and get signoff that it matches requirements. Then you do development; developers do their unit testing. Doesn't matter if it's formally documented except to the extent you need that to manage the dev team members.

Next you do a thorough system test. If it's an initial implementation you are performing extensive tests on your data. These should be counts and things that are defined by the architects but which tie back to "prove" requirements have been fulfilled.

Finally, you have a round of UAT (User Acceptance Testing). In this phase, the customer or requestor does their own testing similar to the systems integration or design testing. Once the UAT is complete and the Users accept the outcome, then any other modifications are understood as change requests and not defects.

This help with morale for the Development team. Otherwise, you end up with a situation that is often demoralizing. 

In modern times, folks have moved to "Agile". Agile is a project management methodology that rose to prominance, to my memory in the 2000s. Agile sort of says the opposite. Build now, then see if you like it, then change it.

The concept has some merit. As a computer science student and now long-time programmer, I have always know that building things first helps to visualize what you are doing. But there is a limit. And that mostly works when you are the only one working on it, or if you have a small team that is aligned in purpose.

Agile also came to prominence during a time when Web development was blowing up and in that environment it makes more sense. Because of the visual nature of web development, builders get a lot of information from the visual representation of the build as it happens. In that environment, making changes is relatively small and to a degree the changes being made would have also been made during a "Design" phase, so it can make sense to use your development process as the design process, especially if all the "Requirements" are visual in nature.

In a Database build, on the other hand, this method is riddled with problems. Agile is, in my opinion, more expensive unless you also do "Agile" for Requirements, Design, Build, Test, UAT. Some people do things like that. They sort of do Agile on the surface but under the hood they are sort of Agiling a Waterfall process.

Adherents of Agile are sort of out there. They treat it like a religion. In fact, part of the "official" Agile writing says that if your project is messed up for some reason, it's not the fault of Agile but because of your "CI CD", which is hilarious. "CI CD" is also a concept that did not exist at one time. It stands for "continuous integration, continuous deployment". Again, this makes sense in a web environment because usually changes are elements being added to the screen, perhaps even some that fetch data.

But when building a massive Marketing Campaign database you don't *Want* continuous deployment. lol Usually you want to make "blocks" of changes or especially with an initial Implementation you deploye most of the solution or some "Phase 1", that kind of thing. It just would break things to literally throw in *every* change as it happens. When making significant changes in a MDB env those "changes" are often intertwined such that at minimum you need to do "batches" or something like that. 

Agile people hate that. I don't know why but I suspect it's because they do not know how to conceptualize what they want and put that concept into a design document that developers can use to write code or build infrastructure. In short, everyone wants to be a "manager" but pass the details to someone else. The result is all the people doing the actual work are guessing about what they should be doing at some level.

Without a hierarchy, its hard to have a head and without a head your body acts like it has no brain.

Designing your data system is an effective way to weed out your initial "bad" ideas and to make changes based on feedback you might not have if you go straight to development. By implementing a forward looking design, you better minimize rework, you make the work clear for the developers and you leave yourself with a lot more documentation than what typically happens in an Agile environment.

Author: Marcus

Post Date: 2024-09-12

By Marcusstriking competent fellow