Talks

Some data science talks I've given

I made an entire e-commerce platform on Shiny

Cascadia R Conf 2021

E-commerce has many components that must be securely handled–managing a user's shopping cart, checking out and taking payment, and fulfilling orders. I am excited to say that I've successfully created an e-commerce platform entirely in a single Shiny app for my side project: {ggirl}. Using the experimental {brochure} package by Colin Fay I was able to make a complex Shiny web service that lets R users order physical postcards of ggplots. I integrated Stripe for payments and used webhooks to know when to fulfill orders. I even used {httr} to make API calls to order the physical products from suppliers after the customer payments are received. In this talk I'll go through the architecture I devised and how you can make an ecommerce platform yourself!

Deep learning isn't hard, I promise

Recorded at New York R Conference 2019

Deep learning sounds complicated and difficult, but it’s really not. Thanks to packages like Keras, you can get started with only a few lines of R code. Once you understand the basic concepts, you will able to use deep learning to make AI-generated humorous content! In this talk I give an introduction to deep learning by showing how you can use it to make a model that generates weird pet names like: Shurper, Tunkin Pike, and Jack Odins. If you understand how to make a linear regression in R, you can understand how to create fun deep learning projects.

When data science projects fail

Recorded at PyData Ann Arbor 2019

Everyone loves talking about successes, but data science projects fail all the time. Datasets don’t end up having signals, the work takes far longer than expected, and products end up missing the mark. In this recorded talk I examine the key themes that show up in projects that fail and how data scientists can spot them coming. To highlight these themes I use examples from the many failed data science projects that I have been personally responsible for.

Push straight to prod: API development with R and Tensorflow

By Heather Nolis and Jacqueline Nolis | Recorded at Rstudio::conf(2019)

When tasked with creating the first customer-facing machine learning model at T-Mobile, we were faced with a conundrum. We had been told time and time again to deploy machine learning models in production you had to use Python, but our very best data scientists were fluent in building neural networks in R with Keras and TensorFlow. Determined to avoid double work, we decided to use R in production for our machine learning models. In this talk, we'll walk through how to deploy R models as container-based APIs, the struggles and triumphs we've had using R in production, and how you can design your teams to optimize for this sort of innovation.

We’re hitting R a million times a day so we made a talk about it

By Heather Nolis and Jacqueline Nolis | Recorded at Rstudio::conf(2020)

Often reserved for Elite Engineers, production can be a perilous place for R users - but never fear! For the past year, we at T-Mobile have been sludging through production outages, nation-wide product launches, and all of the muck that floods from R models being hit over a million times every day. From “we’re strictly a java shop” to a devops team that proudly states “we support Java, node, and R,” this talk will cover the technical hiccups, interdisciplinary communication struggles, and an open-source R package {loadtest} that’s changed the way our team views performance testing. You too can dazzle your enterprise with the power of R.

You're not paid to model

Recorded as part of the Metis Demystifying Data Science series 2019

Everyone loves talking about successes, but data science projects fail all the time. Datasets don’t end up having signals, the work takes far longer than expected, and products end up missing the mark. In this recorded talk I examine the key themes that show up in projects that fail and how data scientists can spot them coming. To highlight these themes I use examples from the many failed data science projects that I have been personally responsible for.

What I Learned from Porting my Viral Website from .NET to shiny

Recorded at NY R Conference 2020

In 2016 I created Tweet Mashup, a website that lets you combine the tweets of two different people. After spending a year making it in .NET, when I launched the site it became an immediate sensation and was mentioned in places like the Verge. Years later, I was getting more and more frustrated maintaining the F# code and decided to see if I could recreate it in Shiny. Doing so would require having Shiny integrate with the Twitter API in ways that hadn’t be done by anyone before. Could I pull it off? Come to this talk to find out!

Spanking and Spreadsheets: Data-driven Sex Journalism

By Heather Nolis and Jacqueline Nolis | Recorded at csv,conf,v4

When we saw that the Stranger, Seattle’s alternative newspaper, was running a survey on kinks and sexual preferences, we knew we had to get our hands on the data. We convinced the that using machine learning methods on the responses would be a good idea, and then we quickly set out to analyzing them. In this talk we will cover how we made sense of the lewd data, the statistical methods we used (and failures we produced), as well as the final results that ended up in our feature article: “There Are Four Kinds of Sex Partners (which one are you).”

Using Dask and Many GPUs to Train a Neural Network with Pytorch

Dask Summit 2021

I took a neural network I had trained on a single CPU to generate pet names and tried retraining it with tons of connected GPUs using Dask, PyTorch, and the package dask-pytorch-ddp. I learned a lot about when is the right time to use multiple GPUs and what the pitfalls can be. In this talk I discussed what these lessons mean for training with GPUs and Dask.