Clare Heinbaugh’s Slice of Data Science Presentation

Clare’s Slice of Data Science presentation on transfer learning and keras made me think about the complexity of creating and training accurate models. I often take for granted how complex the mechanisms underlying technologies like object recognition are because they’ve become so commonplace in everyday life. Clare’s first example of music categorization demonstrated that tons factors go into determining one genre from another using a machine model. A machine can’t listen to a song and intuitively give it a label like a person can, so automating this process requires many input variables, which in and of themselves require quantifying things I would never think possible to quantify. How does one determine a numerical score for danceability? Or energy? This example really made me consider the many seemingly simple things modern technology can do that actually require incredibly complex computations. This presentation also made me think about how machines process information, which adds another dimension of complexity to creating predictive models. Clare pointed out that a computer can’t ‘see’ an image of a tomato. The features of an image have to be converted into matrices or numerical characteristics that can be processed by a computer before any kind of machine learning can even take place for object recognition. Considering this complexity, I’m astounded there are neural nets able to distinguish a tomato from a strawberry, let alone the condition of one road compared to another road.

The Roadrunner lab’s work trying to identify good vs bad roads is really interesting to me because analyzing satellite data could have a huge number of potential applications. For example, my brother’s data science firm created a neural net for the U.S. military that uses satellite images of nuclear weapon facilities to recognize when other countries are developing new nuclear facilities. Another potential application for satellite images that came to mind while I was listening to this presentation was looking at deforestation, urban growth, and other environmental changes that could be contributing to climate change.

It was also really cool to hear Clare talk about some of the things we’ve been working on in class. She discussed the benefits of splitting the data into training and testing subsets, including preventing overfitting and improving the predictive power of a model. She’s clearly working with a more complex dataset than we are in Data 146, but it’s still cool to see how these foundational techniques can be applied to higher level analysis.