Back to Home
🌃 Open Data Literacy Internship: NLP on Public Records

The final analysis that I produced is available on GitHub, hosted in an interactive Jupyter notebook with Binder. You can try it here.

🤔 Problem Space

For eight weeks in summer 2018, I worked for the City of Seattle and the State of Washington on an open data-related project. The scope of the internship was to support data literacy for people in the public sector.

My partner and I collaborated on creating a governance document for an open data-related paraprofessional organization. In parallel, I spent time analyzing trends in public records requests to support proactive disclosure of datasets.

The project had two parts: data analysis and a policy document. I was in charge of the data analysis.

Throughout the project, we also wrote about our experience. You can view some reflections here and here.

🛠 Process

We began by interviewing a number of different records professionals. We collected their names through the snowball sampling method. We spoke with for about an hour each, then performed thematic analysis based on our notes and coded transcripts, which you can view in this repo. We sought to better understand the needs and expectations of data professionals that might be involved in the organization,

Once we identified the themes, my partner and I collaboratively drafted a document that acted as a charter for the organization, outlining the expectations for participants. open data-related paraprofessional organization.

I began with three datasets from different municipalities across Western Washington. I researched current techniques in natural language processing and clustering algorithms, building a data processing pipeline and visualization toolkit.

The end-goal was for Open Data Champions (people that work with open data) to use the platform to ingest, process, and comprehend their data better.

🎉 Outcomes

We traveled to Olympia to meet with some of the partners who helped facilitate the internship. We also presented our work for discussion, which you can see here.