Changing CRDT operations under a Cloud

By Kailas Venkitasubramanian in article community research Data Science Management

June 9, 2023

The promise and peril of a large contract

Much of the previous challenges in managing the technical operations at the data trust stemmed from a lack of understanding of the scope and extent of effort for a given piece of work and having no barometer to measure productivity (or the lack of it). This meant that everyone knew that a given piece of work took 1 month to complete, everyone agreed that this delay was not acceptable,but no one really could pinpoint where the bottlenecks were and why they existed.

Rather, as a response to this problem, a large dart was thrown to solve it - broadly modernize the entire data infrastructure of CRDT and taking it to the ‘Cloud’. This seemed like a safe path, a path that did not require folks to identify core issues but try to reface everything so that a new start can be made. Something analogous to a brute-force restart of a computer when it freezes due to one or two malfunctioning applications.

The talks of this big-splash infrastructure development was well underway when I took up this role. They held promise for my team to operate better at scale, but I was seeing several blind spots. One, I did not have the luxury to wait for this solution to arrive. I had to make things functional now. This meant I had to identify the issues and overcome them now while this infrastructure talk was brewing.

Moving along with the wind while walking a new path

The dominant premise of this infrastructure investment was that the root of our problems were largely technical. But I read them as largely managerial. Most of my early work with CRDT was managing and changing how work was done setting realistic benchmarks on going from point A to point B, and dismantling legacy understandings about what could be done with our resources. In other words, I was normalizing productivity - asking why a given task would take a week, identifying where the frictions lay, advising new approaches, and testing new methods.

As soon as these new benchmarks started getting met in a few weeks thanks to our brilliant graduate students, we significantly improved our technical operations by the Spring of 2022. But I saw that the overall service times were still lengthy, but now they were mostly due to clumsy governance processes, lack of well-defined service delivery models and scant data documentation. That the large investment was targeting the relatively minor reason for the core problem instead of the elephant was a concern. In fact, a year later, we would quantify the attributes causing service timelines to stretch and amusingly rediscover this concern.

But it was challenging to shift the strong momentum and energy for this initiative at the time of my entry, and modernizing the infrastructure as a whole was beneficial to our work regardless of my reservations. This is to mildly say that I failed in my attempts early on to redirect our energy to surgically solve the specifics and incrementally build things instead of dumping big dollars for a revolution (especially when knowing that key drivers of delay were largely untouched).

While those talks continued and we decided to contract the vendor to develop a web portal for data requests as a first step, my team pushed forward to bolster our capacity and processes.

Demystifying our work

Our work at CRDT is simple. It’s talked about in esoteric terms at meetings, it’s said to be complicated and lengthy. It’s not. We bring data in, they get merged with other data, and we send the blended data out. But to get to the simplicity I’m claiming here, we still needed to do the hard work of professionally streamlining each step of the way. More importantly, we had to demystify the process to all stakeholders so that they can make rational investment assessments and meaningfully engage in our growth.

As we started building databases and writing code to ingest and link data, I started to work on a skeleton of what we now called as the technical operations manual - a single reference document that details all key analytical processes involved in the work of CRDT. The manual needed to be a self-contained, living document that our staff can build, update and shape as time goes. The manual would serve both as an onboarding/continuity resource and an internal knowledgebase about our data and operations. Building a data dictionary and documentation was an essential sub-process to this manual as we sought to draw in metadata of each of CRDT datasets as part of the compilation.

In the next post, I’ll follow up with details on the creation of this resource.