The following post is a slightly abbreviated version of Responding to Open Data Concerns from the Johns Hopkins University Center for Government Excellence (GovEx). The post is accompanied by a video that examines several of the concerns. Please contact the author, Eric Reese, at email@example.com if you have questions about this resource.
Cities starting down the path of opening data often have concerns about releasing data to the public. Here are some common concerns, together with responses.
1. “Someone will reengineer the data to get personal information”
Personal data should be protected, and we must have strong governance practices to ensure that we are releasing data that should be released. One of several strategies to manage the risk is to aggregate data to ensure that private information is protected. Many cities, for example, aggregate crime data to the nearest intersection or block level when it is released (for example, Seattle uses the Hundred Block Level). This allows critical information on public safety to be available to the public without identifying individuals.
2. “Our data will expose what we’re doing poorly”
Hard-working city leaders and staff should never be satisfied with poor performance. Releasing this information can encourage self-correcting behavior, help the public understand our challenges, and generate stronger partnerships with stakeholders. We can put the data in context by telling the story around the data—a strategy that can also help make the case for additional resources or alternative solutions.
Many cities release datasets with explanations and important contextual information that helps explain their data. This allows them to tell what they are doing to address challenges and improve performance while promoting transparency and opening lines of communication with the public. Publishing data can also provide opportunities to interact with members of the public to make corrections to errors and build trust.
3. “The data will be misinterpreted”
Some local government staff may fear that media or residents will take their data and miss key elements, telling the wrong story. But opening data to the public provides an opportunity to shape the story, since users will draw their own conclusions with or without the released data.
Providing context along with data can help us tell our story and ensure that we are addressing items that might be commonly misinterpreted. A big-city example is from San Francisco, which provides contextual data for one of its biggest and highest profile challenges: housing. The city’s Housing Hub provides policies, reports, and resources to help put the data it is releasing in the proper context.
4. “We don’t have time to prioritize opening data while doing our jobs”
While it is true that an open data program takes resources, it also promotes efficiencies, most often from a reduction in freedom of information requests and improved data sharing across city departments. Automating the routine publication of data can significantly reduce staff burdens in the long term. Prioritizing an investment in open data can help staff focus on their jobs instead of responding to requests for data. For example the City of Hartford publishes towing data automatically every hour so that residents can check if their car has been towed, reducing the burden on staff to respond to requests.
5. “We might get sued if we release protected information”
6. “The cost to keep this program going in the future will be too high”
Some jurisdictions fear that they will spend a lot on software, staff, training, and so on while getting little in return. But it may be more costly to continue with business as usual. Delivering open data encourages cities to view their digital information as a more strategic asset. Furthermore, publishing raw data allows third parties to deliver that information to the public (build apps, visualizations, etc.), in ways that can often relieve us of that responsibility. For example, the Chicago Transit Authority provides a directory of third-party applications that make the information more accessible to the public, reducing the need for the Authority to do so directly.
7. “Using an open data portal creates a cybersecurity risk to our internal IT systems”
Generally, open data portals, and the data they hold, are completely separate from your internal IT systems. There are very infrequent exceptions to this, and in those cases, the Johns Hopkins University Center for Government Excellence (GovEx) can provide best practices to ensure the greatest possible security of both your technology infrastructure and your data. There are no known examples of cybersecurity attacks where government data has been inappropriately obtained through open data portals.
In technical terms, the data and the systems it’s housed in are typically decoupled through automated ETL (extract-transform-load) or human intervention; there isn’t usually a connection between a customer accessing information on an open data portal and the computers where the data is maintained. During ETL processes, data is flattened, filtered, merged with information from other databases, and otherwise manipulated which obfuscates or masks the source data system(s). Finally, an effective open data program has processes to review data before it’s published to prevent the accidental release of data not appropriate for public consumption.
Having a clear process to publish data eliminates cybersecurity risks. Chattanooga, for example, created a workflow to ensure that data moves through the proper channels before being released. Once data is identified for publishing, it’s reviewed by the city attorney’s office, the office of performance management and open data, the relevant department’s open data coordinators, and the department of information technology. The city has also set clear protocols for decoupling data from the city’s data systems before releasing data to promote security.