4 Helpful Tips When Starting in Data Science

How to Avoid Repeating My Mistakes

Cassie Nutter
6 min readJun 2, 2021
Photo by Michael Dziedzic on Unsplash

Congratulations if you are just beginning your data science journey! It can be a turbulent but rewarding process. There are many that have been where you are now (including myself) and are here to help you succeed.

Below are four pieces of advice from someone that was recently in your shoes. Some are mistakes I made while others are things I found to be extremely important. Let’s check them out!

1. Add a README and .gitignore

If you have just read that and said, “Huh?” then I am glad. That means you have not had the chance to make my first mistake.

You will most likely be using a site called GitHub. GitHub will be useful to showcase your code, save each version of your code, and collaborate with others. When you make a “repository” on GitHub (think of it like a folder), it will ask you if you would like to add a README file or .gitignore. While these items can be added later if you decide not to select them, I found the process is simpler when they were added in the beginning.

Creating a new repository on GitHub

Don’t worry. I can already hear what you are thinking. “What is a README and .gitignore?”

A README will be the first thing some sees when they go to your GitHub repository. Think of it like your GitHub is a website and the README is the homepage.

A .gitignore is a document that keeps track of everything you don’t want others to see. Sometimes you will have confidential information or credentials that are specific to you. Those are great things to keep in a .gitignore. In addition, GitHub does not like extremely large datasets on it’s site. Storing them in the .gitignore can be the solution if you find yourself in that situation.

Photo by Jennifer Burk on Unsplash

Now you’re thinking, “Okaaay. So what do I do with those?”

2. Make a clean README and Notebook

Yep. I hear that too. Loud and clear. That question brings us to the second tip on how to make a nice README and notebook.

Here, a notebook refers to code that is written with an open-source web application called Jupyter Notebook. “Jupyter is a free, open-source, interactive web tool known as a computational notebook, which researchers can use to combine software code, computational output, explanatory text and multimedia resources in a single document.” * Translated: this means that a notebook allows you to write code, see the output and put in text and images (or gifs) in the same place.

Example of code, output, and text

The image above shows a cell containing code (in Python), the output is the visualization, and the text is written to give the reader more information about diagram.

“How do I make it ‘clean’ ?”

Save yourself a world of pain and try to go through the notebook as you write the code. Things you will want to clearly state while working through your code include:

  • Have a descriptive title
  • Create a real-world problem and explain how you are providing a solution
  • Know your data: where it came from and how it is useful in solving the real-world problem
  • Explain yourself: document why you did what you did so others can follow along and replicate it if necessary
  • Describe your findings: quantitative results are important, but make sure you can elaborate on how those results apply to the initial problem

Creating a README will be easier to complete when you have finished the steps above. Once you have answered your real-world problem, you get to decide the most important pieces to share with others. You can use the list above to guide your README, but avoid being too lengthy. Consider including:

  • “A picture is worth a thousand words”: use some of your visualizations
  • Write what you would do differently if you were given more time or data
  • Add your contact information
Photo by Chiara F on Unsplash

“I’m just starting! I’m not there yet. How am I going to remember all of this?”

3. Bookmark things you found helpful

YAY! Helpful nugget number three! Are you finding this article (or any article on Medium) has useful information that you may need later? Save the story!

If you find something on a different website, bookmark that in your browser. You can make a folder for your data science bookmarks to locate them easily and find them later.

Maybe you have already started bookmarking things. That will be helpful in the future, but here is where I blow your mind. Bookmark things that you use in the present. Had an issue trying to get your code to work and found the answer in a Medium article? Save it!

There was a time I needed to store a secret key but I didn’t have a .gitignore (see Tip #1). I searched and searched until I found the Medium article that solved my issue. Months later, I went back to add a .gitignore on other repositories, but could not remember the steps and could not find the article.

There is nothing like finding the answer to your Python prayers and when the time comes to replicate or build upon it, you can’t remember how you arrived to that initial solution.

I wish that you find the answers to all your questions on the first link you click on, but that may not be the case. When you find the article that was most beneficial, listen to me and put it in a place where you can find it again.

Photo by Pop & Zebra on Unsplash

4. Utilize other sources

You have found yourself on Medium, so I know you are intelligent and resourceful. Data science is a growing field and there are more and more legitimate places to look for clarification. Here are just a few:

  • YouTube
  • Books: old-fashioned, but tried and true
  • Stack Overflow: users post questions that can be solved by other members
  • Kaggle: site where users share their data and code
  • Other people: classmates, colleagues, or anyone that seems to know what they are talking about (strictly speaking about data science, of course)

Here I talk about classmates. Maybe you have chosen a path where you don’t have classmates. I would implore you to get out there and find someone in the same position as you. This new journey you have embarked on will require creative time management strategies and lots of self-compassion. Having someone that is walking that path too can be invaluable.

If you looking for that first friend on the path, I’m right here, only a few paces ahead of you.

Good luck on your wonderful new voyage.

Visit my GitHub to see my projects or add me on LinkedIn

--

--

Cassie Nutter

Aspiring Data Scientist, dog lover and running enthusiast