The most important thing when you start working with DataOps is to understand that it's far more than just DevOps for data. It's a mental and technical shift in how data systems should be managed and we strongly believe that it will influence the future of data world.
We are sure that you have heard about DevOps and DataOps. We all IT people love buzz words, don't we?
Since the "DevOps" term is with us for quite a few years, it's rather well understood (despite the fact it is often unfairly downsized to CI/CD). However DataOps is a new kid on the block and we are sure that the peak of its fame will happen in a few years. What's more both of them sounds similar. We guess that's why DataOps is considered as just DevOps for Data Analytics. But is it actually true?
Let's try to find out similarities and differences.
Agile methodology - both approaches originate and take a lot from Agile philosophy. An adoption to continuous change is a key here.
Working and thinking in iterations - our experience shows it is much harder to think and work in iterations that people tend to think. The problem in Data Analytics is even more serious - Data Developers (Data Engineers, Data Scientists, Data Analysts and others) find it difficult to split their work in small iterations.
Collaboration! As we remember that DevOps philosophy has been created to collapse the wall of confusion between DEVelopment teams and OPerataionS teams. The same applies to DataOps - collaboration is one of the success factor to build highly performing Data Pipelines.
Feedback! Simply speaking, DevOps and DataOps are about working in iterations and a continuous improvement. It can be only achieved when the proper feedback is collected. Feedback from the end users of Data is definitely a key performance indicator for organizations that applied DataOps.
As we can see all "soft" values are shared between DevOps and DataOps. There should be no surprise, since both methodologies get values from Agile and Lean Manufacturing.
The fundamental difference between DevOps and DataOps is the main focus area: DevOps focuses on a Code flow through a DevOps Pipeline, whereas DataOps focuses on a Data flow through a Data Pipeline. It is obvious that DataOps also handles testing and CI/CD activities for the code which produces data. Ultimately, Data Developers write a Code to produce Datasets, but Data itself is the superior product of their work.
Testing - a bit more complicated in DataOps, since we need to address both, code and data tests. What’s more, quite frequently fully automated tests (e.g. statistical process control - SPC) are not enough. In that cases, manual validation of data is needed. Whereas in DevOps, in most cases, tests can be fully automated (e.g. unit tests for web application).
Orchestration - thing which doesn't exist at all in DevOps, but it is extremely important in DataOps, especially these days when we are managing more and more workloads on our production systems, whereas requests from end users for new data models are coming more and more frequently.
Definitely DataOps concept originates from DevOps, however it faces a bit different problems and uses a bit different tools to solve them. There should be no surprise here - we, Data Developers, have different challenges than our colleagues - Software Developers. By the way, thank you Software Developers for finding out so brilliant thing as "DevOps". Without you our data world would remain full of tedious tasks for a long time, but thanks to DataOps the future seems to be bright!
Thanks for reading!
Hope you enjoy it and if you'd like to talk more about it, please reach out to me via email: adrian@faro.team
Sources: