Uncategorized

The Data Is the Model: Jose M. Plehn’s Vision for Verifiable AI and Civil Society

Published

2 months ago

October 28, 2025

Navneet

Artificial intelligence continues to astonish the world with its expanding, awe-inspiring capabilities. But, beneath the spectacular headlines lies a quieter, more enduring revolution — the struggle for verifiable data. For Jose M. Plehn, Ph.D., an academic turned data entrepreneur, the real future of AI lies in data veracity. Dr. Plehn is the founder of BrightQuery, an innovative data analytics service provider. He is also the founder of OpenData.org, and a board member of the AI Alliance. Plehn and many of his peer data experts believe that the real future of AI will be defined not by how many parameters data models contain, but by the trustworthiness of the data that feeds them.¹

Plehn’s argument sounds deceptively simple: “AI will only be as honest as its sources,” he quibbles. His company, BrightQuery, founded in 2019, converts legal, regulatory, and tax filings from over 100,000 jurisdictions worldwide into structured, machine-readable datasets.² The result is a vast economic map linking hundreds of millions of entities—companies, individuals, and locations—grounded in verified public records. It is the kind of factual infrastructure that large language models (LLMs) increasingly depend on to reduce hallucinations and bias.

The concept Plehn champions most forcefully is provenance —knowing where data comes from, how it was structured, and under what authority it was published.³ Without it, even the most sophisticated AI systems operate in epistemic fog. His view echoes emerging global standards such as FAIR data principles (Findable, Accessible, Interoperable, Reusable),⁴ and the Data Provenance Standards promoted by the Data & Trust Alliance and OASIS Open.⁵⁶

“Every claim made by a model should be traceable to a verifiable record,” Plehn has said in interviews.⁷ BrightQuery’s platform and its work with the National Secure Data Service (NSDS), a federal project, aim to make government data secure and accessible across agencies.⁸

The Rise of Factual AI

In 2024, as debates over open vs. As closed AI increased, the Open Source Initiative clarified “open AI” to stress transparent datasets and provenance metadata.⁹ Plehn’s perspective matches this shift closely. Within the AI Alliance, he has helped shape the Open Trusted Data Initiative (OTDI), which brings together IBM, Meta, Hugging Face, and other partners to build datasets whose lineage can be audited and whose licenses are unambiguous.¹⁰

This movement toward factual AI systems that can cite, verify, and justify their answers is reshaping how industry and government approach data governance. It’s a step beyond the current fixation on “responsible AI,” because it makes responsibility measurable. “If you can’t trace the data,” Plehn argues, “you can’t trust the result.”

From Proprietary to Public Benefit

In https://opendata.org/2025, Plehn took his transparency philosophy further by launching OpenData.org, a nonprofit initiative that makes parts of BrightQuery’s global entity graph freely accessible.¹¹ What could have remained a lucrative proprietary asset has instead become a public commons, allowing journalists, researchers, and policymakers to use the same factual data as regulators and large institutions. During the launch, Plehn stated that “open data is just as vital to democracy as the right to free speech.”

OpenData.org builds on a growing recognition that factual data itself constitutes civic infrastructure. The idea is that trustworthy AI cannot exist in isolation. It must rest on shared, inspectable evidence. The organization’s open datasets already support collaborative projects with the United Nations Global Network of Data Officers and Statisticians, which promotes data capacity-building across governments.¹³

Bridging Academia, Policy, and Practice

Plehn’s résumé bridges academia, entrepreneurship, and public service. His Ph.D. research focused on computational economics, and he later taught courses on quantitative modeling at MIT, UC Berkeley, and UCLA, before turning to applied data systems. That hybrid background allows him to move fluidly between the technical, ethical, and regulatory dimensions of AI governance. In Washington, Dr. Plehn collaborates with the Data Foundation, a nonpartisan group advancing evidence-based policymaking.¹⁴

In a 2025 panel hosted by the Data Foundation, Plehn warned that “AI without verified data is like science without peer review.” That statement captures his pragmatic ethos: that the health of our digital society depends not on algorithmic miracles but on accountable record-keeping.

The Next Frontier: Accountability You Can Compute

If BrightQuery’s mission succeeds, provenance may become a first-class property of data pipelines. Every dataset, every model, and every AI-generated claim could carry a verifiable chain of custody. This idea aligns with IBM’s and Meta’s efforts under the AI Alliance to standardize data cards, transparency metrics, and lifecycle audits.¹⁵⁶ It also dovetails with ongoing work at MIT’s Data + AI Lab, which studies explainability and factual verification in machine learning.¹⁷

Critics might say this vision is idealistic. Regulatory filings can lag by months, and comprehensive global coverage remains a monumental task. Yet, Plehn’s approach is incremental and systemic; he’s betting that provenance, once made convenient and valuable, becomes the default expectation rather than the exception.

Why it All Matters

As the AI industry matures, the debate is shifting from scale to substance. The winners of the next decade may not be those who train the largest models, but those who can prove their models’ claims. Plehn’s work at BrightQuery and OpenData.org demonstrates how transparency can scale: verified data can power both profit and public trust.¹⁸

For all the rhetoric about “trustworthy AI,” few have built organizations around the mundane, essential plumbing that trust requires. Jose M. Plehn has—and in doing so, he may be laying the groundwork for a new social contract between data, algorithms, and democracy.

FlipWeb

The Data Is the Model: Jose M. Plehn’s Vision for Verifiable AI and Civil Society

Uncategorized

The Data Is the Model: Jose M. Plehn’s Vision for Verifiable AI and Civil Society

Leave a Reply

When HOAs Become Hostage Regimes: The Perils of Self-Dealing and the Northwood Estates Litigation