Reproducible Research Framework at the Charlotte Urban Institute: Why It Matters Now

By Kailas Venkitasubramanian in reproducibility data science community research urban analytics open science

July 8, 2022

In recent years, conversations about reproducibility have moved from academic journals into policy circles, foundations, and government agencies. What was once framed as a “replication crisis” in psychology has broadened into a wider concern about the credibility, transparency, and cumulative nature of scientific work across disciplines (Open Science Collaboration, 2015).

For those of us engaged in quantitative community research—especially in dynamic regional contexts like the Charlotte metropolitan area—reproducibility is not merely a philosophical concern. It is an operational one.

At the UNC Charlotte Urban Institute, reproducibility is not just about rerunning code for a journal submission. It is about building durable, trustworthy analytical infrastructure for communities, local governments, and nonprofit partners.

This series outlines how we are thinking about—and institutionalizing—a reproducible research framework at the Institute.

The broader scientific context

The modern reproducibility movement emphasizes a handful of practical principles:

  • Transparent data and code sharing and clearer disclosure of analytic decisions (National Academies, 2019)
  • Pre-registration (and, more generally, separating “confirmatory” from “exploratory” analysis) (Nosek et al., 2018)
  • Disciplined computational workflows: version control, project organization, and repeatable execution (Wilson et al., 2017)
  • FAIR data principles — Findable, Accessible, Interoperable, Reusable (Wilkinson et al., 2016)

In computational research, reproducibility increasingly means more than “attach the dataset.” It usually requires:

  1. Scripted workflows (not manual steps hidden in spreadsheets)
  2. Versioned data and code
  3. Captured computational environments (or at least a stable, documented setup)
  4. Clear governance and documentation so other people can actually use the work

For applied urban research—where we combine federal open data, administrative records, survey instruments, and geospatial layers—complexity multiplies fast.

Why reproducibility is different in community research

Community research is not laboratory science. We operate within:

  • Evolving administrative datasets
  • Data use agreements
  • Privacy constraints
  • Political and policy sensitivities
  • Changing geographic boundaries
  • Iterative community feedback

Unlike static experimental data, our inputs shift monthly or even daily. For example:

  • HMIS data can update continuously.
  • ACS estimates release annually and can revise weighting and methodology.
  • Local administrative systems change vendors, definitions, and field structures.

Reproducibility in this context is not about “freezing the world.” It’s about designing systems that track change systematically and transparently so the region can understand what changed, why it changed, and how the numbers were produced.

That is a fundamentally different challenge than the typical “supplemental materials” approach.

From individual analysts to institutional memory

One of the most overlooked risks in applied research centers is tacit knowledge loss.

When analytic workflows live in:

  • A single RStudio project
  • An analyst’s local machine
  • An undocumented data cleaning script

…institutional memory becomes fragile.

A key shift we’re making at the Urban Institute is moving from “analyst-centered reproducibility” to infrastructure-centered reproducibility. Concretely, that means treating research operations more like software engineering:

  • Centralized, version-controlled repositories
  • Documented ETL pipelines
  • Shared data dictionaries and indicator definitions
  • Standardized folder structures
  • Automated report rendering

This aligns with the spirit of Good enough practices in scientific computing—small, concrete habits that make work easier to audit, extend, and hand off (Wilson et al., 2017).

Reproducibility as regional infrastructure

For the Charlotte region, reproducibility is not just internal housekeeping. It is a public good.

When local governments ask:

  • “How was this indicator constructed?”
  • “Why did last year’s number change?”
  • “Can we replicate this for our county?”

We should be able to answer confidently—with documentation and executable code.

In initiatives like the Charlotte Regional Data Trust and the Quality of Life Explorer, reproducibility is what enables:

  • Consistent indicators across counties
  • Transparent methodological updates
  • Reduced duplication across agencies
  • Faster iteration for policymaking

This is especially important in areas like housing affordability, eviction analysis, and health equity—where analytic decisions can influence funding, program design, and policy priorities.

Reproducibility strengthens legitimacy. And legitimacy matters in community research.

The emerging framework at the Urban Institute

Our evolving framework’s success rests on being able to implement five pillars effectively.

1) Structured project architecture

Every project follows a standard directory and documentation structure—so any staff member can open a repo and understand where things go.

2) Script-first analysis

No manual spreadsheet transformations for production work. If a step matters, it must be scripted and reviewable.

3) Versioned data layers

We separate—and document—raw, cleaned, and analysis-ready layers. That makes it possible to trace results backward.

4) Automated reporting

R Markdown / Quarto workflows ensure tables, maps, and figures are produced directly from code, reducing copy/paste errors and making “refreshing” results straightforward.

5) Governance and documentation

Metadata, decision logs, and indicator definitions are stored alongside the code. This is where a lot of reproducibility actually lives.

This isn’t about perfection. It’s about minimizing ambiguity.

A cultural shift (not just tools)

Ultimately, reproducibility is less about tools and more about culture. It requires:

  • Patience in documentation
  • Shared norms across teams
  • Leadership commitment
  • Institutional incentives

At the Urban Institute, our goal is not simply to publish research outputs. It is to build a regional analytic ecosystem that is transparent, scalable, and resilient.

Reproducibility is how we future-proof that ecosystem.