Hello, my name is Christina Holland, and I’m an (aspiring) data scientist.
I’m just now four weeks into an immersive data science “boot camp” with General Assembly (https://generalassemb.ly). It’s a scary time of transition, but most definitely a time of growth, and of discovering new ways to approach solving some fascinatingly complex problems, with some amazing instructors and classmates.
I’m no stranger to science, or to data. Getting my PhD in physical oceanography certainly required me to wrap my head around some complicated sets of data and make sense of them. What I’m finding is that the approach in the world of data science is similar … but also pretty different. I hope you’ll bear with me for a short aside.
How does data science fit with a physics background?
tldr: Ocean physics models start with the physical dynamics and simulate ocean data to compare to the real thing, using deductive logic. Data science models use a set of training inputs and outcomes and inductive logic to find the pattern and create a model that can predict the outcome for new input data.
It’s a question of how you build answers to the questions in front of you — through inductive logic, or deductive logic.
We’re all of us, as humans, masters at inductive logic. That’s pattern recognition. If I tell you a set of numbers starts with:
1, 4, 9, 16, 25, …
then most everyone will correctly say the next number is 36. This ability to recognize patterns and use those patterns to predict future events is essential for our survival as individuals and as a species, developed from the earliest hunter/gatherers recognizing where to find the best food.
My early career training is as an ocean scientist. In college I majored in physics, and in the 11th hour changed my math minor into a second major. I started scientific research my sophomore year, spending most of a semester aboard a research ship mapping the density and magnetics of the seafloor between Easter Island and Christchurch, New Zealand (with a small but very fun stopover in Tonga). After college I went on to graduate school in oceanography, and focused my attention on problems related to climate dynamics.
In the world of physics, they don’t really stress the inductive logic.
It’s all about the deductive logic. Not “what is the pattern?” But rather “WHY is the pattern?”
And this really comes into play when you start talking about modeling. In the last year of my PhD, I was working with some large datasets and complex modeling, but it wasn’t much like what I’m learning now. Building upon the work of many previous oceanographers, and in collaboration with others, we basically put equations (representing what we thought were the essential elements of ocean and atmosphere physics) into a 3-dimensional grid of the Pacific, with 20 vertical layers in the ocean and an atmospheric mixed layer, gave it initial and boundary conditions, and let it run forward in time for several “years”.
This model simulated the climate dynamics in the Pacific, and only afterwards did we compare it to the real ocean data. At that point we’d ask questions like “Does the model ocean have the same patterns I see in the real world ocean?” — fundamentally just a yes or no question. If the answer was no, we’d have to go back and adjust the physics and run it again, until we could say “Yes, this looks like the real ocean in terms of the pattern we’re interested in.”
(If you really want to read more, go here: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2004JC002466)
That’s a completely different structure than the modeling I’m doing now.
Now, I’m learning to take in big data sets like the (infamous among data science students, I’ll bet) Ames housing data, and predict outcomes (like the price of houses) based on the known data but without any real prior knowledge of the mechanisms that might cause the differences.
How did I get here, and why?
I went into physics and math in the first place because I just love puzzling out complicated problems. Then I chose climate dynamics in particular because I love the ocean, but I also wanted to make a positive impact on the world.
To put it in data science terms, if X = [[my actions]] and y = [people’s general well-being], I’d like for the result of:
to be a positive coefficient.
Somewhere along the way, I realized ocean physics just wasn’t keeping me engaged. I wanted new challenges. But I still wanted that positive impact, so I decided to become a classroom teacher.
I taught high school math for 12 years, in Texas and Georgia, a mix of general education, gifted, and special education. For the first 4 years, I taught geometry and pre-calculus at a project-based academy, where the emphasis was really on finding ways to apply the math concepts to real life situations. My absolute favorite of all my class projects was when I noticed a particular group of students (who didn’t as a rule do much work) were avid skaters. So we used surface area and volume of composite shapes (of the students’ choosing) to design and make a budget for a skatepark. Darned if those skater kids didn’t give me an amazing project.
That pretty clearly illustrated to me that math is a lot more fun with you apply it to varied and interesting projects.
That’s part of what draws me now to data science — I want to apply mathematics to solve a wide variety of complicated problems.
The other piece is that as a a classroom teacher, the mathematical problems just weren’t all that interesting anymore, for me personally. I’m ready to learn new things — all the things, please — and to use that new knowledge to figure out solutions to new problems that nobody’s solved yet.