The goal is to archive the raw version of data that seeds our data processing pipeline for each state. These raw files will vary in granularity and type:
The datasource.py for each state provides the link between raw data and year-by-year breakdowns of that data, regardless of its original form. This might mean, for example, that in the case of a single large database dump, we pre-process the dump and store yearly files on Github in addition to the raw dump; or if we outsource to MechanicalTurk, we chunk out the processing in a similar year-by-year fashion that datasource.py can interface with.
Our filename standardization conventions for the “common” case should be distilled from Github for reference.
The basic format for a generated filename is as follows:
So, for example:
Which contains precinct-level results from the Nov. 6, 2012 Maryland general election in Queen Anne’s county. A primary election might look like this:
Not every state has individual files for a jurisdiction, however. Ohio has precinct-level results for an entire election:
Or a single file for a special general election for a single office:
OpenElections focuses on federal and state offices, including state legislators. When pre-processing data or doing data entry, use the following office names:
Other office titles that are specific to states should be consistent within that state; if the state has previous data, check that.