Reading data from MySQL to Pandas Dataframe

Posted on 02nd July 2020

In a previous post Writing data from a Pandas Dataframe to a MySQL table you learned how to write(export) data in a Pandas dataframe to MySQL table using the to_sql() function. Now let's see how to read(import) data from a MySQL database table on to a Pandas DataFrame.

The read_sql() function allows you to read data from a MySQL table. Similar to to_sql function, the read_sql function also needs at least two parameters. The first parameter is a SQL query string or a table name and second is the SQLAlchemy engine or a database string URI.

Here is a simple example for using read_sql() function to read all records from a MySQL database table named employee.

from sqlalchemy import create_engine
import pandas as pd


engine = create_engine("mysql+pymysql://{user}:{pw}@{host}/{db}"
                           .format(host=hostname, db=dbname,
df = pd.read_sql(tablename, engine)

Getting data using an SQL query instead table name

In the above example we are passing the table name to the read_sql function. Alternatively you can pass a SQL query in place of the table name. So for example you can also make the read_sql call like below:

sql = "Select * from employee"
df = pd.read_sql(sql, engine)

Using a database connection URI instead of SQLAlchemy engine

In the above examples, the second parameter for the read_sql function is connection/engine created using the create_engine function of SQLAlchemy. Instead of the SQLAlchemy engine you can pass a Database URI. For example

df = pd.read_sql(sql, "mysql+pymysql://{user}:{pw}@{host}/{db}"
                           .format(host=hostname, db=dbname,
                                   user=uname, pw=pwd))

Specifying a index for the dataframe

While importing from MySQL, you can set any column(or columns) to be used as the index for the dataframe. To do this, read_sql() function accepts the optional parameter index_col which can be set to any column name.

For example, consider a table named employee with the following columns.

id bigint(20) 
name text 
age text 
country text
start_date datetime

The following code will read all records from employee table and sets the id column as the index for the dataframe.

df = pd.read_sql('employee', engine, index_col="id")

Sample Output

       name age         country		start_date
111  Thomas  35  United Kingdom		
222     Ben  42       Australia
333   Harry  28           India

Parsing Dates

If you have columns in your MySQL table with date values, you may parse those by setting the parse_dates parameter in read_sql() function. For example:

df = pd.read_sql('employee', engine, index_col="id",parse_dates={'start_date':'%d/%m/%Y'})

Note that this parameter is only required if the column data type is not datetime. For datetime columns, Pandas will import as datetime type even if you do not set this parameter.

Post a comment


Nothing the first to share wisdom.