Report

Playbook For Opening Federal Government Data

How Executive & Legislative Leadership Can Help

15
min read
Download PDF

/

Author

Authors

Lacey Strahm
Plaintext Group
Researcher
Victoria Houed
Plaintext Group
Associate

Individual Endorsers

No items found.

All individual endorsers participated in their personal capacity. This report was prepared independently from any political or governmental entity. While the report generally reflects the observations, insights and recommendations of the endorsers, it is not the case that every endorser will agree with everything expressed herein.

Executive Summary

Copy Exec Sum to Clipboard
Copy Exec Sum to Clipboard
Enabling government data to be freely shared and accessed can expedite research and innovation in high-value disciplines, create opportunities for economic development, increase citizen participation in government, and inform decision-making in both public and private sectors.
Each day government data remains inaccessible, the public, researchers, and policymakers lose an opportunity to leverage data as a strategic asset to improve social outcomes.
Though federal agencies and policymakers alike support the idea of safely opening their data both to other agencies and to the research community, a substantial fraction of the U.S. federal government’s safely shareable data is not being shared.
The Biden Administration needs to assign open government data as a 2021 Cross-Agency Priority Goal in the President’s Management Agenda. This goal should revitalize the 2018 CAP Goal: Leveraging Data as a Strategic Asset to improve upon the 2020 U.S. Federal Data Strategy and emphasize that open data is a priority for the U.S. Government.
The U.S. Chief Technology Officer (CTO) should direct a Deputy CTO to focus solely on fulfilling this 2021 CAP Goal. This Deputy CTO should be a joint appointment with the Office of Management and Budget.
Absent elevating open data as a top priority in the President's Agenda, the U.S. risks falling behind internationally. Many nations have surged ahead building smart, prosperous AI-driven societies while the U.S. has failed to unlock its nascent data. If the Biden Administration wants the U.S. to prevail as an international superpower and a global beacon of democracy, it must revitalize its waning open data efforts.

Challenge and Opportunity

The COVID-19 pandemic took 592,776 American lives as of June 2, 2021.<fn-sp>1<fn-sp> This grim number would have been higher if not for the continuous stream of data on infection, mortality, and spread released by the Department of Health and Human Services(HHS) during the height of the pandemic.<fn-sp>2<fn-sp> This government data, made freely available and easily accessible, empowered data scientists to produce public-focused models, analyses, and predictive analytics which accelerated scientific and public health insights, shortening the time it took for COVID-19 information to save American lives. Opening this data was essential to the U.S. pandemic response, and in retrospect, every day this stream of data was delayed, American lives were lost. As a country, we cannot afford to wait for the next crisis to ensure that our data is ready to be used to improve decision-making.

Open data, as defined by the Government Accountability Office (GAO) is data in a standardized format that is free to use, modify, and share.<fn-sp>3<fn-sp> Open Government Data, as defined by the Foundations for Evidence-Based Policymaking Act, are public data assets created by, collected by, under the control or direction of, or maintained by a federal agency that are machine-readable, available on a comprehensive data inventory in standardized non-proprietary formats.<fn-sp>4<fn-sp> Open government data generates many public benefits, including increasing citizen participation in government,<fn-sp>5<fn-sp> spurring research and innovation<fn-sp>6<fn-sp>, creating opportunities for economic development<fn-sp-extra-space>7<fn-sp-extra-space>, and informing decision-making in both the private and public sectors.<fn-sp>8<fn-sp> Government facilitated open data has benefited critical decision-making in communities torn apart by natural disasters<fn-sp-extra-space>9<fn-sp-extra-space> as well as, most recently, communities responding to the COVID-19 pandemic.<fn-sp>10<fn-sp>

Policymakers have shown consistent support over the years for open government data.<fn-sp>11<fn-sp> Figure 1 gives a brief timeline of some of the recent statutes and guidances, including major successes like the creation of data.gov and the appointment of a Chief Data Officer (CDO) in every agency. These advancements were direct results of the Executive Branch and Congress’s designation of open data as a priority for the U.S. government.

Despite these positive strides, there remains more work to be done. As part of the FDS 2020 Action Plan, agencies were asked to make data governance materials publicly available by January 31, 2020.<fn-sp>12<fn-sp> Over half of the agencies did not post.<fn-sp>13<fn-sp> Of the agencies that did post, 2,258 datasets remain non-public as of Q4 2020.<fn-sp>14<fn-sp> This means a sizable amount of government data that is legal to share with trusted non-government researchers is not being shared.<fn-sp>15<fn-sp> To get this non-public government data into the hands of researchers, government personnel need to address the various challenges that prevent agencies from opening their data.<fn-sp>16<fn-sp> Policymakers and practitioners alike agree on the nature of the challenges that exist but struggle to implement effective solutions because implementation is very hard. Implementing a whole-of-government approach to open data will require interagency & interdisciplinary stakeholders contributing to the design of a process that works for not only the individual agency but, more importantly, the collective U.S. government. Leadership in the U.S. government must act on opening government data given that Artificial Intelligence (AI) will be a key technology to the U.S government’s national success in the 21st century and it is powered by data.<fn-sp>17<fn-sp> The insights that can be derived from federal data have the potential to supply national actors with new information to make data-driven decisions that can drive American progress and competitiveness across multiple industries. Each day government data remains inaccessible to researchers, American entities fall behind internationally and unknowable scientific insights are deferred to the future causing people to live worse lives unnecessarily. Due to the government’s position as the largest and most important holder of data, our ability to build a smart, successful AI-driven society is dependent on the capacity to open our data as soon as possible.

Plan of Action

The Biden Administration, with the support of the Deputy Director for Management (DDM) at the OMB, should explicitly emphasize that open government data is a top Administration priority. They can do this by assigning open government data as a 2021 CAP Goal in the PMA. In accordance with President Biden’s desire to refresh and reinvigorate our national science and technology strategy,<fn-sp>18<fn-sp> the 2021 CAP Goal should revitalize the 2018 PMA CAP Goal: Leveraging Data as a Strategic Asset, to improve upon the 2020 FDS.

Currently, there is no dedicated government official positioned with the mandate to champion the 2021 CAP Goal along with the necessary authority to execute on such a goal. Upon filling the vacant U.S. CTO seat in the Office of Science and Technology Policy (OSTP), the U.S. CTO should direct a Deputy CTO to focus solely on fulfilling the 2021 CAP Goal. The Deputy CTO should be a joint appointment with OMB.

Congressional and Executive leadership alike can support the Deputy CTO in fulfilling the 2021 CAP Goal by emphasizing that open data is a priority for the U.S. at all levels of government. Executive leadership can prioritize opening data that is in demand by national actors to restore America’s global standing. Legislative leadership can support the innovation economy and create new jobs by opening existing federal government data and mandating the creation of new data.

As part of the Deputy CTO’s strategy for fulfilling this CAP Goal, the Deputy CTO should address the following challenges that prevent many federal agencies from opening their data. The challenges outlined below have been sourced and synthesized from conversations conducted by the Plaintext Group with employees from various federal agencies.<fn-sp>19<fn-sp>

Budget Challenge

There are no explicit statutory appropriations to support and fund the work of an agency’s CDO<fn-sp-extra-space>20<fn-sp-extra-space> or additional technical staff needed to open high-value datasets. The statutes and guidances mentioned in Figure 1 signal support from policymakers, but many are unfunded mandates, leaving agencies responsible to find funding.

Legislative solution

  • Congress can allocate funding for open data in the annual appropriations legislation.

Executive solution

  • The Technology Modernization Board can prioritize open data projects and encourage agencies to apply to the Technology Modernization Fund (TMF) to receive incremental funding and technical expertise.

Workforce Challenge

The public sector talent pipeline is in crisis as the need for talented public servants has sky-rocketed<fn-sp-extra-space>21<fn-sp-extra-space> and the government’s personnel systems are not currently designed to build, support, and promote a data workforce.<fn-sp>22<fn-sp> Many data teams lack technical expertise, full-time staff,<fn-sp>23<fn-sp> and continued training.

Legislative solution

  • Congress can direct a data science occupational series to support CDOs as they hire talent with technical skill sets.

Executive Solution

  • The Office of Personnel Management (OPM) can create a data science occupational series,<fn-sp>24<fn-sp> establish job classifications for data roles, enhance existing roles, and explore training and certifications to ensure that data practitioners are continuously improving their skills.<fn-sp>25<fn-sp>
  • Federal agencies can use flexible hiring authorities<fn-sp-extra-space>26<fn-sp-extra-space> to recruit experts in data engineering for temporary “tours of duty” to help identify which agency datasets will support high-impact use cases, and how to build pipelines to assemble this data.<fn-sp>27<fn-sp>

Guidance Challenge

The Open, Public, Electronic and Necessary Government Data Act of 2018 (OPEN Government Data Act) required the OMB to issue guidance by July of 2019 for agencies to implement comprehensive data inventories, but this guidance has yet to be released.<fn-sp>28<fn-sp> This failure has limited agencies’ progress in implementing their requirements under the act.<fn-sp>29<fn-sp>

Legislative solution

  • Congress can encourage<fn-sp-extra-space>30<fn-sp-extra-space> the OMB to release the Administration’s plan for Phase II Implementation Guidance for agencies to implement comprehensive data inventories.

Executive Solution

  • OMB can issue the statutorily required Phase II Implementation Guidance for agencies. Releasing this guidance will allow agencies to implement comprehensive data inventories and provide agencies with direction as they work to meet their requirements under existing open data statutes. This will also reaffirm that open data is a priority of the Administration.<fn-sp>31<fn-sp>
  • The OMB Director can establish a new position, Assistant Director for Information Policy, to oversee, manage, and coordinate relevant activities across OMB’s divisions and offices, and serve as the OMB’s liaison for Executive Office of the President (EOP) appointees outside of OMB looking to work with the OMB on data.<fn-sp>32<fn-sp>

Incentive Challenge

Mandates have been provided with myriad action items, corresponding to milestones and target timelines. Agencies prioritize their data assets as it relates to their mission statements but lack a concrete incentive structure and corresponding motivation to post, update, and maintain their data assets.

Executive Solution

  • The Administration, with the help of the Chair of the CDO Council, can create incentives for participation and compliance with data-sharing efforts.<fn-sp>33<fn-sp> These incentives can engage the entire federal government behind common data priorities.

Community Challenge

The absence of interagency collaboration opportunities and public-private partnerships limits the imagination to envision use cases for federal open data beyond their current facility.<fn-sp>34<fn-sp> Agencies may not know the value of their data to other stakeholders because they may not regularly communicate data needs or inventory beyond their agency of operation.

Executive Solution

  • Leveraging prior work undertaken through the Networking and Information Technology Research and Development (NITRD) program’s Big Data Interagency Working Group,<fn-sp>35<fn-sp> OSTP can hold an innovation sprint to build a roadmap to establish an open knowledge network in a phased manner.<fn-sp>36<fn-sp>
  • The General Services Administration (GSA) can host an annual hackathon in partnership with willing federal agencies centered around open data projects.

Quality Challenge

The differences in quality of data in the possession of federal agencies is vast.<fn-sp>37<fn-sp> Quality in this context refers to the sophistication of the data’s format and structure. Some data is clean and structured as machine-readable information, accessible in databases prepared for AI applications. Other pieces of data are formatted as PDF photos of hand-written notes hosted in folders on desktops. Many data assets include inaccurately labeled, incomplete, or missing data. Unsophisticated and messy data is useless.<fn-sp>38<fn-sp>

Executive Solution

  • Leveraging the work of the National Institute of Standards and Technology (NIST)<fn-sp>39<fn-sp>, The Chair of the CDO Council can issue a common policy and set of best practices to support the release of AI-ready government data to the public and work with industry and academia to adopt compatible policies and best practices for reciprocal sharing and documentation.<fn-sp>40<fn-sp>

Privacy Challenge

Much of federal agencies’ data assets contain personally identifiable information (PII)<fn-sp>41<fn-sp> and sensitive data. As opening this data risks disclosing PII,<fn-sp>42<fn-sp> privacy-enhancing techniques are necessary. However, many agencies do not have the necessary expertise or guidance to implement effective privacy-enhancing techniques.

Legislative solution

  • Congress can pass legislation establishing a National Secure Data Service (NSDS)<fn-sp>43<fn-sp> to facilitate data access for evidence building<fn-sp-extra-space>44<fn-sp-extra-space> while ensuring transparency and privacy. The NSDS should model best practices for secure record linkage and drive the implementation of innovative privacy-enhancing technologies.<fn-sp>45<fn-sp>

Executive Solution

  • The National AI Initiative can coordinate National Science Foundation (NSF) funded privacy researchers to undertake rotational assignments at federal agencies<fn-sp-extra-space>46<fn-sp-extra-space> and work closely with agency personnel and data stewards to responsibly unlock access to more of the government’s data.<fn-sp>47<fn-sp>

Security Challenge

Trust in government infrastructures is low. As recent high-profile hacks<fn-sp-extra-space>48<fn-sp-extra-space> have highlighted, government technology infrastructure is outdated and in need of major upgrades.<fn-sp>49<fn-sp> Agencies’ risk management strategies take this reality into account and often determine that the cybersecurity risk of opening government data is not worth the reward.

Executive Solution

  • The Cybersecurity and Infrastructure Security Agency (CISA) supported by the Federal Risk and Authorization Management Program (FedRAMP) can establish guidance about best open data practices and offer resources<fn-sp-extra-space>50<fn-sp-extra-space> for agencies as they open their data.
  • The National AI Initiative can coordinate NSF-funded cybersecurity researchers or CISA cybersecurity professionals to undertake rotational assignments at federal agencies<fn-sp-extra-space>51<fn-sp-extra-space> and work closely with agency personnel and data stewards to responsibly unlock access to more of the government’s data.<fn-sp>52<fn-sp>

Specificity Challenge

Agencies are more likely to be able to respond to requests for expanding access to data if government and private sector experts can identify specific datasets and high impact use cases that would be enabled if this data was made available.<fn-sp>53<fn-sp>

Executive Solution

  • The Administration can encourage researchers, practitioners, and other stakeholders to identify high-priority datasets using an ideation competition, with prizes provided on a rolling basis.<fn-sp>54<fn-sp>
  • GSA — with the help of the CDO Council — could effectively crowdsource and host an open register of questions, projects, and data pairs from researchers, practitioners, and other stakeholders.

Frequently Asked Questions

1. Why is a joint appointment for the Deputy CTO necessary?

OMB has statutory authority over much of the current open data related statutes and guidance but is primarily government facing. OSTP lacks statutory authority but has more freedom to access external technologists for sourcing implementation expertise. Therefore, a joint appointment would place the Deputy CTO in the best position to successfully coordinate and execute on the U.S. FDS.

2. Are the challenges listed above in order of significance?

No. Many of the challenges listed above are interconnected and this is not an exhaustive list of challenges. Addressing one challenge effectively may entail solutions from several challenges.

3. What are notable outcomes and achievements from enabling open access to government data?

  • Health data from HHS is used by Aidin to improve the patient-placement process in choosing a post-acute care provider.<fn-sp>55<fn-sp>
  • Food data from the Department of Agriculture (USDA) is used by researchers to address a range of research and planning questions related to food demand relationships.<fn-sp>56<fn-sp>
  • Environmental data from the National Centers for Environmental Information (NCEI) is used by ranchers to make timely, critical decisions that can directly affect the success of their operations.<fn-sp>57<fn-sp>
  • Data on infection, mortality, and spread released by HHS during the height of the pandemic empowered data scientists to produce public-focused models, analyses, and predictive analytics which accelerated scientific and public health insights, shortening the time it took for COVID-19 information to save American lives.<fn-sp>58<fn-sp>

Read more about the Day One Project <rte-link> here<rte-link>.