Tim Thoughts data/rstats/policy

People are asking, what can I do?

Before you rush into it, ask yourself. What is it you care about?

Election Day

Good morning. In less than an hour, volunteers from here will join others from around the United States. And you will be launching the largest GOTV effort in this history of mankind.

Mankind -- that word should have new meaning for all of us today. We can't be consumed by our petty differences anymore. We will be united in our common interests.

Perhaps its fate that today is the 8th of November, and you will once again be fighting for our freedom, not from tyranny, oppression, or persecution -- well actually, you will be.

We're fighting for our right to live, to exist.

And should we win the day, the 8th of November will no longer be known as an American election, but as the day when the world declared in one voice:

"We will not go quietly into the night! We will not vanish without a fight! We're going to live on! We're going to survive!"

Today, we celebrate our Election Day!

Tampa Update

things I've drastically improved my knowledge of:
git
shiny
rselenium
votebuilder
sql
hamilton lyrics
snapchat
coffee consumption

A Million Things I Haven't Done

So much for writing about the election this year. I just moved to Tampa to join Hillary Clinton and the Florida Democrats, I'll be working on the data and analytics team for the duration of the election. Volunteering as a fellow during the California primary for HRC will always be a highlight of my life, and I'm really thankful for all the great people I met.

But now the hardest part comes next. I'll be working with some really smart people on all sorts of various projects, my primarily focus will be aiding the organizers and ground game to maximize their impact here in Florida. I'm excited to use my growing data skills on the largest (and according to recent polls, the most contested) battleground state with some amazingly gifted people.Posting R code and graphs will have to wait for a while. Wish me luck, I'm excited to learn all and do all the things.

When times get hard, and they will, I have to remind myself of why I started all this. One of my first posts here a year ago was a little short story joke I wrote about the Don, but it's no longer funny. I take my politics very seriously, and I'm eager to do whatever it takes to put Hillary into office. This isn't the post where I tell you why I support her (maybe a later time), but I'll briefly say that I am proud to stand with my fellow Democrats behind such a qualified candidate.

My boss gave me a survey to complete before I arrived, it's a list of questions about my professional identity and personality. I've been thinking a lot about what I want to do with my life, and it was refreshing to rethink some of my goals about what I want out of this campaign for myself. It's really important to put my pride and ego aside and concentrate on my daily tasks, but I get that I'm supposed to grow as a person. I'll list some of them here now to remind myself.

Professional goals: Master the VAN/Votebuilder, better understand the landscape of Democratic Party data infrastructure, learn how to do more data science things like random forest algorithms, practice SQL/Tableau, absorb all of Daniel Kreiss's new book

Personal goals: Eat healthy, practice self-care, send postcards home, call mom/dad more, memorize the Hamilton soundtrack

Unlikely goals: Post more updates on Snapchat/Instagram, 2K MMR on DOTA, try more EDM music, sleep, date hahahahahaha okay let's just stop here.

There's so much left to learn, I am so eager for this challenge. I won't come home til the job is finished, and I hope to be a better version of myself by then. I hope this election will change me for the better, and I look forward to the grind. See you in November!

Data Perfectionism is the Enemy

Back in graduate school, my development economist professor assigned us a story by Jorge Luis Borges. In "Funes the Memorious", the protagonist meets a teenage prodigy who has developed perfect memory due to a horseriding accident. For example the young boy could recall an entire's worth of memories, and spend an entire day reliving his thoughts from the previous day.

As we discussed the story, a student remarked wow, what a cool ability! But I recognized the lesson right away. The boy's ability was a blessing but yet a curse, his perfectionist ability to get every single detail correct prevented him from thinking abstractly or broadly. Hence the relevence to a class on economics, as our professor was stressing the necessity of sacrificing a tiny bit of detail for more broader policy validity. Or something like that, I assumed that was his point. Sorry Jeff.

I've been working on aggregating all the primary results by congressional districts, and it's been increasingly frustrating when noting the disparity between state reporting methods. Tuesday's results in Pennsylvania is a fine example, for they only reported the results by county.

I messaged the Green Papers, and they pointed me towards an AP press release for the Democrats that was much more helpful. To attain their estimated delegate count, they told me their method was "to go from county to CD -- say, 30% of a county is in CDa, 60% in CDb and 10% in CDc: we take 30% of the county vote and apply it to CDa, 60% of the county vote and apply it to CDb, and 10% of the county vote and apply it to CDc. We did that for each county. We have found that the results much more usually end up pretty close to what the final delegate numbers per CD turn out to be".

It should be good enough for me.

Sigh. Okay so there's 67 counties and 18 congressional districts in Pennsylvania. Some counties are entirely located in one district, some are split across more than one. But I now know that Precinct 1271 of Whitehall Dist 1 of Alleghany County in Pennsylvania (literally the smallest and atomic unit of political geography in the United States) is actually split into 2 Congressional Districts.

I know this because I'm going to every 67 county websites, downloading their data directly, and filtering it into R. I figured I can obtain which district a particular precinct belongs to if I check which Congressional race they're voting for. That's when I realized 1271 of Alleghany was voting for both candidates and delegates to the 14th and 18th district. When I checked 1272-1288, etc to see the rest of Whitehall 2-16's District, they're all in the 18th. You can double-check my results by Control + F "1271 Whitehall Dist 1" at this website.

So that ONE particular precinct has a few people living in the 14th. Just to confirm, I made a phone call to Alleghany's elections department this morning, and they told me that because of some redistricting that occured during even-numbered years (when Congressional representatives are elected), a precinct such as 1271 may be split. It's literally the only one in the county.

Aftering doing more online digging, I finally found an updated spreadsheet matching precincts to districts at the state website. Considering the lesson of Funes and grad school, I will just recode 1271 a part of the 18th district and move on. Going for perfect granularity when you're dealing with election data is a sisyphean task that will be an endless timesuck.

I just thought I should write a disclaimer, because while I theoretically accept and understand what I just wrote, it's still frustrating to know that there's incomplete data out there. Just learn to live with it, Tim.