Dr. Jin Yao Uses Python and Data Analysis to Support Decisions Impacting Johnson County Government Resources

5 Questions With Dr. Jin Yao, a senior data analyst with the government of Johnson County, Kansas, who teaches Python Boot Camp. Dr. Yao uses machine learning and statistical analysis to support criminal justice and human service interventions in regional communities. Decision-makers in Johnson County’s government rely on data analysts like Dr. Jin Yao to understand whether programs and services are effective.

Dr. Yao discusses how she uses Python and data analysis in her community-based work.  The beginner-level Python Boot Camp will be held online on Tuesday, March 29, 1–4 pm, and Saturday, April 2, 9 am–12 pm. Registration is still open.

Let’s begin with basics. What is Python? How is Python used?

Python is a general-purpose programming language that can be used in many fields. Many people have written Python packages for a variety of uses. A package is a collection of pre-written code with a specific focus, e.g., making graphs. Anyone can use these packages. There are over 300,000 Python packages with a wide range of functionality – data visualization, statistical analysis, machine learning, web scraping, automation, image processing, graphical user interfaces, multimedia, system administration, web frameworks, and mobile apps.

Is Python an essential skill for someone considering data science as a career or someone who works with data?  

Python is the tool of choice for data scientists & analytics professionals now and for the foreseeable future. Dr. Yao points to two surveys of note.

According to Burtch Works Executive Recruiting 2021 survey, 48 percent of data scientists and analytics professionals surveyed prefer to use Python. The survey’s demographic results show: “While some more traditional teams in industries like financial services or pharmaceuticals may still be using SAS, we continue to see more of these teams converting to Python or allowing professionals to choose their tools.” 

Further, O’Reilly’s 2021 Data/AI Salary Survey found that Python was the most popular programming languages for data and AI practitioners (61%), followed by SQL (54%). The survey also evaluated programming languages and salaries. O’Reilly’s found that the “most widely used and popular languages, like Python ($150,000), SQL ($144,000), Java ($155,000), and JavaScript ($146,000), were solidly in the middle of the salary range. Languages like Python and SQL are table stakes: an applicant who can’t use them could easily be penalized [salary-wise].” 

Why is Python a popular data analysis program to learn and use?   

Python is open source which means individuals and companies can use it for free. It is beginner-friendly because it features English syntax and was designed to be concise and easy to read. 

How can Python be used to clean, merge, and analyze data? 

To accomplish these tasks (cleaning, merging, and analyzing data) in a reproducible, efficient, and automatic way, you must use a software tool. Python is one of the options. For example, every morning at 9:30 am you are provided with two data tables (COVID test results and COVID vaccination records). Each table has between a few hundred to a few thousand rows. You are asked to identify which ones are breakthrough cases by 9:35 am. How do you accomplish this task? You can write a Python program to merge data, do calculations, and schedule the computer to execute the Python program at 9:31 am every day.  

Tell us how you use data analysis in your work for the government of Johnson County.  

Most of our data have been cleaned and stored in Microsoft SQL databases by my colleagues. I use SQL to select relevant data and import these data into either Python or R for statistical analysis or machine learning. Here are two examples of data analytics work I am involved in.  

First, collaborating with the University of Chicago, we have developed a machine learning, predictive model to generate monthly outreach lists for the county mental health center. The goal is to reduce the jail recidivism – the tendency to relapse into criminal behavior – rate among those with mental illnesses. This way of proactive outreach has been in operation since May of 2019.  

Second, we evaluated effectiveness of a County program launched in 2017 with the goal of reducing unnecessary dispatches of EMS units. The program provided mental health outreach to those who frequently use emergency medical services (EMS). Real-time, interactive dashboards enabled staff to monitor spatial and temporal patterns of frequent users and provide timely care. The analysis revealed that the program significantly reduced the number of ambulance runs per frequent user, and saved hundred of thousands of dollars in ambulance fees and emergency room costs. Data analysis provided quantitative evidence to support the continuation of the program.

Register for Python Boot Camp

Registration is open for Dr. Jin Yao’s Python Boot Camp, a beginner-level class for using Python in data analysis. The boot camp will be held online on Tuesday, March 29, 1–4 pm, and Saturday, April 2, 9 am–12 pm.

We assume students have zero knowledge in Python,” says Dr. Yao. Students will learn the basic concepts and coding of Python and a few packages commonly used for data analysis. Students need to have a Google account. We will use Google Drive and Google Colab as our learning environment. We won’t spend time on downloading and installing Python on our own computers.