Ballot Data Collection: Frequently Asked Questions

Where does this information come from?

The most basic question ballot information answers are “Who is on my ballot?” (candidates) and “What are they running for?” (contests). This information is created and maintained by state and local election officials. The office responsible for official information creation differs depending on the state and the level of contest. For the most part, information about federal, statewide, and state legislature contests, as well as any offices whose jurisdictions cross county lines (e.g. judges, state school boards, public utility boards) is maintained by the state election authority (commonly the Secretary of State).

More local candidate and contest information is usually maintained by county election authorities, though in some states this responsibility lies with municipal authorities. All told, there are 7,987 offices responsible for running elections in America. This obviously creates a challenge, as the local data is much more decentralized than its higher-level counterparts. Some states have made efforts to aggregate this local information, with varying degrees of success. Even when successful, states may not receive all information from local sources in a timely manner, or may not receive any late changes made at the local level. For these reasons, the local election authorities (often county clerks or their analogues) will be the most accurate and timely sources of information for these local contests.

How is this information published by elections officials?

Information about candidates and contests is published in nearly every format imaginable, with little to no consistency within most states, let alone across states. Some states may offer candidate files that can be downloaded in a CSV or Excel document, but those files usually do not include local contests and are by far the exception, rather than the rule. PDFs are the most common format that information is available in, though these files list candidates and contests in different ways and individual files require significant and unique effort to be turned into usable data. Some candidate and contest information can be found on individual localities’ websites, but there is no standardized format used and extracting the information still requires significant effort.

The above all assumes that information about candidates and contests live online, which is not the case everywhere. Nearly one-third of counties do not have elections websites, so collecting candidate information from these jurisdictions require some combination of phone calls, faxes, and physical mail.

When is this information available?

Information about candidates and contests that will be on a ballot is available in different ways and levels of finality at different times in the election cycle. Officially, elections offices must publish a notice of what contests will be on the ballot no later than 100 days before a regularly scheduled election. Candidate filing deadlines vary between states, counties, and even offices within counties, but most often fall between one-to-three months before the date of the election. Though this is the moment where a person legally becomes a candidate, office-seekers often begin campaigning prior to the filing period - in which case information may be obtained about these proto-candidates from required campaign finance filings.


If an election is partisan, there may be a separate filing deadline for independent or third party candidates. In the case of a general election following a partisan primary, there is often still an opportunity for independent candidates to file in the period between the general and primary election. Additionally, candidates who win a primary may sometimes withdraw or be replaced for various reasons before the general election -- laws differ between jurisdictions on when these changes must be made. Any changes or additional filings will be finalized no later than 45 days before the election, barring extraordinary circumstances.

How does this timeline affect civic engagement organizations?

While collecting candidate and contest information once it is finalized is useful for those who simply wish to display what voters will see in the voting booth, different organizations have different aims. Since campaigns are active before even the initial candidate filing period, campaign finance focused organizations will need to begin tracking extremely early in the election cycle. Organizations who seek to provide contextual and political information about candidates will need to track news, speeches, and other information about candidates long before ballots are finalized. Mobilization-focused organizations will similarly need candidate information earlier than finalized information can be collected, especially since some states have voter registration deadlines that fall soon after the 45 day mark. This leads to organizations creating non-standardized datasets that suit their needs, but which contain information that would be useful to all organizations in the space.

How is candidate information related to individual voters?

When a voter receives a ballot, the contests they see are determined by which electoral districts contain their registered address. While federal and state legislative districts are maintained at the state level, lower level districts are usually maintained by the local jurisdictions. Electoral districts for different office types (congressional, state legislature, county/city boards, school districts, etc) seldom bear any relationship to each other. Even the most basic of electoral geographies, the precinct, will not necessarily be wholly contained within any single electoral district. “Precinct splits” are common, and a single polling place often needs to have dozens of different ballot styles on hand to serve its voters. A lack of defined electoral districts is a significant roadblock to extending local coverage of candidate information.

Even assuming electoral district definitions can be obtained, matching an individual voter to their districts is still not an easy task. There are two main ways of going about this matching: geospatially and textually. The geospatial approach relies on obtaining geographic definitions of districts (“shapefiles”), then plotting an voter’s registration address to see what boundaries they fall within. This approach has the advantage of being efficient and useful from a technical standpoint, and would theoretically be able to handle any given voter’s address even if they are not yet registered. Shapefiles for lower level districts are often unavailable or non-existent however, since election officials predominantly use a text-based system for their official records. Additionally, since electoral districts have extremely precise boundaries -- cutting down and across streets, and in some cases even through apartment buildings -- any mis-mapping creates false matches at the edges of districts.

The textual approach associates districts with either individual addresses or “street-segments.” A street segment defines a specific set of addresses on a given street, e.g. odd numbered houses on 1-31 Main St., and associates this segment with a set of jurisdictions. In this case, a voter’s address would be parsed, assigned a street segment, and then returned the resultant jurisdictions’ relevant information. If street segments are not available, a set of jurisdictions for an individual voter can be found by looking up their address on a voter file and returning the resultant jurisdiction fields. This approach is predominantly used by election officials, and so is possible in almost all jurisdictions regardless of their technical acumen. It has the advantage of being extremely precise in its matching. It does, however, require resources for address parsing, and may not find a non-registered voter’s jurisdiction if their address does not appear in the voter file.