It was October 2001, one month after the tragic terrorist attacks on 9-11-2001. I was sitting in my NASA office when the phone rang. The voice on the other end of the call said, “We would like you to brief the President tomorrow on data mining.” I remember clearly my response: “Do you mean THE President?”
Yes, they did mean the President of the United States. When I asked how they thought of me for this briefing, they said that they realized that they need expertise in data mining in order to detect and stop the next terrorist attack, so they asked at different federal agencies who was an expert in data mining, and the NASA HQ folks said that I was the NASA expert in data mining! Wow! I never thought of myself in that way. At that time, I was doing some fairly small projects and trying hard to learn about data mining. How did it come to this?
Two significant revelations came to me right then in that moment: (1) that the little bit that I knew about data mining was already considered to be “expert”; and (2) that the explosion in data volumes and data analytics applications were not just in the sciences, but everywhere, in all domains, including some very significant and world-changing applications.
With degrees in Physics (B.S.) and Astronomy (Ph.D.), I was always doing computation, modeling, and data analysis. My first 18 years of work after graduate school were at NASA, working on large data systems for space science missions and programs. As the data set sizes began to grow larger and larger, I started investigating data mining and machine learning as tools to do greater exploration and discovery within these massive data collections. After 18 years at NASA, I was determined that the world will need a strong and large workforce of trained data specialists (what we now call data scientists) in order to explore and exploit these massive and growing digital data collections. I joined the faculty at George Mason University as a Professor of Astrophysics and Computational Science, where I spent 12 years teaching students in data science, computational modeling, statistics, data ethics, databases, etc.
It has been almost 20 years since I started my deep research into the tools and algorithms of machine learning and data mining, then applying those methods scientifically to real problems – those applications are the analytics. Starting with astronomy and space science data, then on earth science and climate data, then on many other types of data in many domains, including digital marketing, healthcare, finance, and more.
I have always loved scientific discovery, and this world (of data analytics and data science) allows me to do discovery bigger and better all the time. I enjoy the challenge of modeling a system in a way that explains the data and that enables prediction and deeper understanding from the data. The whole process inspires me to do more of it. And there are always learnings: the good and the bad. Both are good teachers. We know about the good. So, let me share an example of the bad.
I guess my first real experience of “bad” in analytics was a satellite image mining project that was intended to detect wildfires from remote sensing images, to build a neural network model to automate the detection, and then to use those insights to build a predictive model of future wildfires. We spent two years on the first parts of the project and never reached the prediction phase. That was because we spent about 90-95% of the project effort in studying, selecting, understanding, preparing, reformatting, and manipulating a massive volume of satellite images.
The analytics (neural network model-building) came very late in the project. That was frustrating (since we never reached the most interesting phase: prediction!), but at least we built some useful (and accurate) classification networks that can be used for wildfire detection. I learned a lot from this experience. One memorable thing that I learned is that the textbooks and “experts” are wrong when they say that 40-60% of the effort on a data analytics project is spent in data prep. The real number is something like 80-95%. Maybe, 95% is ugly to some people, but to me that is the time when you get up close and personal with your data, getting to know all about it – that is powerfully valuable knowledge-building and should be embraced.
These days my life outside work is dominated by family activities. With three very young grandchildren now, my wife is very busy with them and our children all of the time, and I join the fun as much as my time permits. Of course, data science is not just my work but my passion, so I write blogs, read a lot, and tweet a lot about data science and analytics. Beyond those year-round activities, my wife and I love to visit the Adirondack Mountains and lakes in upstate New York, so we try to spend some time up there every summer (or spring, or autumn).
I love storytelling (especially with data) and quippy quotes (short quotes with an impactful message). So, anyone who does good storytelling well is someone who inspires me. One quote in particular that I always shared with my students is this one: “If we knew what we were doing, it would not be called research.” Someone claimed that this was a quote from Einstein, but probably not. Anyway, that quote reminds me (and my students) to remain humble, inquisitive, and in “learning mode” all the time as we explore this wonderful world of data. A quippy quote that I love is “the wheels of progress are not turned by cranks!” This reminds me to keep some humor and positive perspective even in difficult situations.
An anecdote that inspires me is about a Nobel-prize winning physicist who started graduate school with some trepidation – he was given a homework assignment of 10 problems in a very hard math physics class, but he was only able to solve two of the problems. He was unhappy with his performance and unsure about his career choice. But then he found out that no other student could solve any of the problems, which turned out to be ten previously unsolved problems in the history of mathematical physics. He was able to solve two previously unsolved problems! I guess he was in the right place at the right time, though at first that didn’t seem to be the case! That is how I reflect on a lot of the opportunities and experiences throughout my life. I wrote about it here: http://bit.ly/2rcu8M1
I wrote an article – The Definitive Q&A for Aspiring Data Scientists – in which I shared some professional advice. The primary message that I give to aspiring data analytics professionals is to follow your passion first, whether it is business, or industrial, or science, or healthcare, or art, or writing, or teaching, or technology, or sports, or whatever. Everything in the world is becoming digital, with data emanating from sensors everywhere. Therefore, since there will almost certainly be analytics opportunities in any career path, you should do what you love and consequently you will love what you do.
Second, I remind young professionals that the best guarantee for an enduring and successful career is a love of learning – be a life-long learner, and you will always find opportunities and provide value to your organization. New things appear all the time, including new technologies, new ways of doing things, and even new programming languages, so the data scientist must be nimble and in learning mode always.
Third, I emphasize to young analytics professionals the important distinction between “push” versus “pull”. This is especially important in my organization, which delivers management consulting services to clients. The important thing is to listen to the client (the stakeholder) and learn what they want (i.e., the “pull”). Do not try to push your latest model, your shiny new software package, or your fancy dashboard on the client – no matter how good it is, the client may reject it simply because you are not listening to their needs. Be self-confident and do let the client know about your shiny product (maybe even see it in action), but let them reflect on whether they want it. It then becomes their “need” (i.e., a “pull” from the client). In any case, one of my quotes that I tell my colleagues in such situations is this: “The customer may not always be right, but the customer is always the customer.”
For those who do not know me very well, I am a very enthusiastic person. If you can see that about me, then I know that we have connected. I love to share the joy and excitement of discovery and learning new things. I hope that it’s contagious!
Oh, by the way, I never did brief the President because I was a contract employee at NASA and was legally not permitted to advise the President in that capacity. However, I did get to brief the staff in the Department of Homeland Security Transition Planning Office, within the Executive Office of the President. That was very cool, and memorable!
You can follow me on Twitter @KirkDBorne (https://twitter.com/KirkDBorne)
Meet Kirk Borne: Analytics Visionary, Space Scientist, and Chronic Learner