The goal is to archive the raw version of data that seeds our data processing pipeline for each state. These raw files will vary in granularity and type:
The datasource.py
for each state provides the link between raw data and year-by-year breakdowns of that data, regardless of its original form. This might mean, for example, that in the case of a single large database dump, we pre-process the dump and store yearly files on GitHub in addition to the raw dump; or if we outsource to MechanicalTurk, we chunk out the processing in a similar year-by-year fashion that datasource.py
can interface with.
Our filename standardization conventions for the “common” case should be distilled from GitHub for reference.
The basic format for a generated filename is as follows:
date__state__{party}__{special}__election_type__{jurisdiction}{office}__{office_district}__{reporting_level}.format
So, for example:
20121106__md__general__queen_annes__precinct.csv
Which contains precinct-level results from the Nov. 6, 2012 Maryland general election in Queen Anne’s county. A primary election might look like this:
20120403__md__republican__primary__prince_georges__precinct.csv
Not every state has individual files for a jurisdiction, however. Ohio has precinct-level results for an entire election:
20041102__oh__general__precincts.xls
Or a single file for a special general election for a single office:
20071211__oh__special__general__house__5.csv
OpenElections focuses on federal and state offices, including state legislators. When pre-processing data or doing data entry, use the following office names:
Other office titles that are specific to states should be consistent within that state; if the state has previous data, check that.