How to Import a CSV into a Jupyter Notebook with Python and Pandas
Documentation for importing a CSV into a Jupyter Notebook with Python and Pandas
If you’re a spreadsheet ninja, I can only assume you’ll want to start your Jupyter/Python/Pandas journey by importing a CSV into your Jupyter notebook.
Let me just say that this is very easy to do, and I’m excited to show you.
Hit that easy button and let’s do it!
Table of Contents:
- Getting started
- Imports
- Read CSV
- Do something to the CSV
- Export CSV
Step 1: Getting started
First, you’ll need to be set up with Python, Pandas, and Jupyter notebooks. If you aren’t, please start here
Step 2: Imports
Next, you’ll set up a notebook with the necessary imports:
import pandas as pd
Pandas is literally all you need for this operation, and it is often imported as pd. You’ll use pd as a prefix for pandas operations.
This is what your notebook should look like:
Step 3: Read CSV
Next, you’ll simply ask Pandas to read_csv, and then assign your spreadsheet a variable name. Sorta like this:
variable_name = pd.read_csv(‘file path’)
The read_csv is a Pandas method that allows a user to create a Pandas Dataframe from a local CSV. You can read more about the operation here at https://pandas.pydata.org/, where you can find all the Pandas documentation you’ll ever want.
Remember, we use the prefix pd to run any pandas operations:
spreadsheet = pd.read_csv('/Users/davidallen/Downloads/file_name.csv')
But first, we’ll need a CSV to read! Let’s use something from kaggle.com. I think this Healthy Lifestyle Cities Report is interesting, so let’s use that one.
If you don’t have a Kaggle account, go ahead and register. It’s a worthwhile site to know about. Loads of datasets to peruse.
Then, just hit the download button to grab all the project resources. Open the zip file and you’ll find your CSV in your downloads folder (or where ever your downloads go). Make note of the location and filename.
Now, let’s import that CSV!
spreadsheet = pd.read_csv('/Users/davidallen/Downloads/healthy_lifestyle_city_2021.csv')
You can use the tilda (~) and then a backslash(/) in front of “Desktop” or “Documents” or “Downloads” before hitting “tab” to get some autocomplete help with the file path.
It should look like this before you hit tab:
spreadsheet = pd.read_csv('~/Desktop') spreadsheet = pd.read_csv('~/Downloads') spreadsheet = pd.read_csv('~/Documents')
And then your computer should autocomplete the path for you, like this:
spreadsheet = pd.read_csv('/Users/davidallen/Desktop/') spreadsheet = pd.read_csv('/Users/davidallen/Downloads/') spreadsheet = pd.read_csv('/Users/davidallen/Documents/')
Then, just start typing out the file name and hit “tab” again to autofill the rest of the path.
See it in action:
Step 4: Do something to the CSV
Now that we’ve loaded our CSV into our notebook, it’s time to do something with the CSV.
First, let’s just take a look at the first 5 rows with a very popular command: head() .
spreadsheet.head()
This will show the first 5 rows (including column headers) of our DataFrame.
You can use the tab again to autocomplete the name of your variable spreadsheet
Just start typing spread and then hit tab.
Looks like this:
Very quickly, let’s just sort the DataFrame by Sunshine hours(City), assign the sorted result to a new variable, and then we’ll export this new CSV.
We’ll assign the sorted DataFrame to a new variable df
df = spreadsheet.sort_values('Sunshine hours(City)',ascending=False)
.sort_values() does exactly what it sounds like. Just pass in the column name (or column names), and then specify whether or not you want to sort ascending or not. Setting ascending=False will sort the DataFrame in a descending manner.
Next, we’ll complete the tutorial by exporting the sorted CSV.
Step 5: Export the CSV
Exporting is as simple as importing. Just use the pandas DataFrame method to_csv to save your df to local storage:
df.to_csv('/Users/davidallen/Desktop/new_csv.csv')
Easy! Just imagine the possibilities.
Opening csv file in jupyter notebook
I tried to open a csv file in jupyter notebook, but it shows error message. And I didn’t understand the error message. CSV file and jupyter notebook file is in the same directory. plz check the screenshot to see the error message jupyter notebook code csv file and jupyter notebook file is in same directory
asked Dec 8, 2019 at 20:45
Shakil Ahmed Shakil Ahmed
13 1 1 gold badge 1 1 silver badge 5 5 bronze badges
You should place code directly in here, and never share screen shots of code. It makes it very difficult to troubleshoot since people cannot copy paste your code.
Dec 8, 2019 at 20:48
Dec 8, 2019 at 20:48
The problem I suppose is in the CSV file looking at the error. Maybe NAs or bad formatted data. Check your CSV for consistency
Dec 8, 2019 at 21:14
I can suggest you anyway to this solution stackoverflow.com/a/58200424/5333248 . If not working try encoding=’UTF-8′ instead. If both don’t work, solution could be much harder to find
Dec 8, 2019 at 21:28
3 Answers 3
As others have written it’s a bit difficult to understand what exactly is your problem.
But why don’t you try something like:
with open("file.csv", "r") as table: for row in table: print(row) # do something
import pandas as pd df = pd.read_csv("file.csv", sep=",") # shows top 10 rows df.head(10) # do something
5,819 9 9 gold badges 55 55 silver badges 132 132 bronze badges
answered Dec 8, 2019 at 21:01
3,232 9 9 silver badges 27 27 bronze badges
You can use the in-built csv package
import csv with open('my_file.csv') as csv_file: csv_reader = csv.reader(csv_file, delimiter=',') for row in csv_reader: print(row)
This will print each row as an array of items representing each cell.
However, using Jupyter notebook you should use Pandas to nicely display the csv as a table.
import pandas as pd df = pd.read_csv("test.csv") # Displays top 5 rows df.head(5) # Displays whole table df
Resources
The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel.
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
Insightlist
3. Потом вызываем содержимое файла в таком виде с такими двумя слешами!
Второй способ открыть csv файл в Jupyter notebook уже без загрузки pandas выглядит так:
Если выгружать csv файл из Google-analytics там не только все в кучу, но еще и слетает кодировка.
Чтобы открыть файл одновременно избавившись от иероглифов в csv файле,
нужно прописать следующее
with open(‘file_name.csv’,’r’, encoding =’utf-8′) as f:
Для исправления кодировки указали этот кусок — encoding =’utf-8′
Получим открытый выбранный файл в нормальной кодировке.
Но не всегда! Бывает, что кодировка не восстанавливается.
3. Самый простой способ если файл лежит на вашем диске компа, а вы уже установили pandas, просто добавить путь «развернув» слеши. Вот так
4. Если проблема с разделителями, в РФ csv файлах в качестве разделителя часто используется точка с запятой, то добавить . sep ‘:’)
How To Load Csv File In Jupyter Notebook?
Loading a CSV (Comma Separated Values) file in Jupyter Notebook allows you to work with tabular data conveniently. This process is fundamental for data analysis and manipulation tasks.
To load a CSV file in Jupyter Notebook, we can use the pandas library, which provides easy-to-use functions for reading and manipulating tabular data. Let’s delve into the article with Step-by-Step Guide:
Load the CSV file – S tandard Pandas Operation (pd.read_csv)
- Use the pd.read_csv() function to load your CSV file.
- You’ll need to provide the path to your CSV file as an argument. If the CSV file is in the same directory as your notebook, you can just provide the filename.
The Python code snippet utilizes the pandas library to read a CSV file dataset and load its contents into a DataFrame.
import pandas as pd df = pd.read_csv('zomato.csv') df.head()
Output:
Traditional Method (pd.read_csv):
Handling Unicode Error
Sometimes, when working with CSV files, you may encounter a Unicode error, especially if the file contains characters that are not in the standard ASCII character set. To handle this error, we can try different encoding options until we find the one that works.
Below is the snippet of Unicode error encountered while loading a CSV file. Below, you can see the error message indicating the UnicodeError and the line of code where the error occurred.
Output:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
in ()
1 import pandas as pd
----> 2 df=pd.read_csv('/content/zomato.csv')
10 frames
/usr/local/lib/python3.10/dist-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7044: invalid continuation byte
UnicodeError occurs when there is an issue with encoding or decoding Unicode data. This can happen when the default encoding used by Python’s read_csv function does not match the encoding of the CSV file, especially when dealing with characters outside the ASCII range.
How to Handle Unicode Error?
To handle this error, one common approach is to specify the correct encoding parameter when using the read_csv function. In the article, the encoding parameter we will use, encoding=’latin-1′ is used.
import pandas as pd df= pd.read_csv('/content/zomato.csv',encoding='latin-1') df.head()
Output:
Handling Unicode Error
However, one can try Different Encodings: Modify your code to try different encoding options when reading the CSV file. Common encoding options include ‘ utf-8′, ‘utf-16’, ‘latin-1’, and ‘cp1252’.
If the CSV file is in a different directory, you’ll need to provide the full path to the file:
df = pd.read_csv('/path/to/your/file/your_file.csv')
Conclusion
Unlock the prowess of Pandas for seamless CSV file handling in Jupyter Notebook.