Exploring data with Principal Component Analysis

Cool, but…

We recently published an analysis of pandemic unemployment and drinking where we used a data analysis technique called Principal Component Analysis (PCA) to visualize several variables in one plot. …

An analysis of elite women’s all-around gymnastics scores

Simone Biles vs. Everyone

The ’96 Bulls or the 2017 Warriors. Tom Brady vs. Joe Montana. Lebron or MJ. Naming the “Greatest of All Time” is a sure source of speculation and controversy.

In data science terms, we can’t make “all-else-equal” GOAT comparisons between athletes. We can’t control for factors like the team around…

Schedule Python and SQL scripts to keep your dataset clean and up-to-date in a Postgres database

Want to try it yourself? First, sign up for bit.io to get instant access to a free Postgres database. Then clone the GitHub repo and give it a try!

The problem

Public and private data sources are plentiful but also problematic:

  1. Source data may get updated frequently but require substantial preparation before…

Here’s how U.S. Google searches for 350+ common hobbies diverged from pre-pandemic expectations

Things have changed

Social distancing and sourdough starters. Masks and Mario games. Remote work and renovating. The pandemic has changed the way we’ve lived since March 2020, and that includes our hobbies. …

Use Github Actions to automatically integrate ETL changes and republish your datasets

Want to try it yourself? First, sign up for bit.io to get instant access to a free Postgres database. Then clone the GitHub repo and give it a try!

An actionable workflow

Pipeline maintenance can be tedious at best and error prone at worst. Once a pipeline is in service, it requires ongoing…

Maintain data quality and catch bugs before your stakeholders

Want to try it yourself? First, sign up for bit.io to get instant access to a free Postgres database. Then clone the GitHub repo and give it a try!

Are we done yet?

In Part 1: The ETL Pattern and Part 2: Automating ETL, we completed a minimal, yet usable, data pipeline for ETL.

Log IoT data to a cloud database using Python on a Raspberry Pi

Measure what matters to you

Years ago, I got back into C++ because I couldn’t sleep at night. I lived in a century-old apartment building in Seattle where my unit had the thermostat that controlled the boiler for the entire building. …

Analyzing particulate matter trends in three major Pacific metros

The data for this analysis was analyzed with the help of bit.io, a standards-compliant cloud Postgres database. bit.io is the fastest way to get your data into a private, hosted Postgres database. Follow bit.io on Twitter at @bitdotioinc.

Wildfire woes

Wildfires are getting worse by nearly every metric. Fires in the West…

Schedule Python and SQL scripts to keep your dataset clean and up-to-date in a Postgres database

Want to try it yourself? First, sign up for bit.io to get instant access to a free Postgres database. Then clone the GitHub repo and give it a try!

Where we left off

In Making a Simple Data Pipeline Part 1: The ETL Pattern, we explained that the Extract, Transform, Load (ETL) process is…

Comparing Rio Olympic Decathletes to their specialist counterparts

Specialists vs. generalists

Olympic medals can be decided by fractions of seconds. Each event requires a specific blend of genetics, training, and luck to have a shot at the podium. To reach the upper echelons of a sport, athletes typically specialize, often down to the event level. …

Andrew Doss

Data Scientist @ bit.io

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store