Writing data from a Pandas Dataframe to a MySQL table

Posted on 21st January 2020

Pandas is currently one of the most popular Python library used for data analysis. A DataFrame in Pandas is a data structure for storing data in tabular form, i.e., in rows and columns. This article describes how to write the data in a Pandas DataFrame to a MySQL table.

Consider a DataFrame with three records like below.

Pandas DataFrame

You can create a database table in MySQL and insert this data using the to_sql() function in Pandas. The to_sql() function requires two mandatory parameters - table name and an SQLAlchemy engine object.

SQLAlchemy is a Python toolkit and Object Relational Mapper (ORM) that allows Python to work with SQL Databases. An engine is the base of any SQLAlchemy application that talks to the database. The engine object is created by calling the create_engine() function with database dialect and connection parameters.

The following Python program creates a new table named users in a MySQL database and populates the table with the three records from our example dataframe.

import pandas as pd
from sqlalchemy import create_engine

# Credentials to database connection
hostname="localhost"
dbname="mydb_name"
uname="my_user_name"
pwd="my_password"

# Create dataframe
df = pd.DataFrame(data=[[111,'Thomas','35','United Kingdom'],
		[222,'Ben',42,'Australia'],
		[333,'Harry',28,'India']],
		columns=['id','name','age','country'])

# Create SQLAlchemy engine to connect to MySQL Database
engine = create_engine("mysql+pymysql://{user}:{pw}@{host}/{db}"
				.format(host=hostname, db=dbname, user=uname, pw=pwd))

# Convert dataframe to sql table                                   
df.to_sql('users', engine, index=False)

After executing the above Python script, login to your MySQL database and query the newly created users table.

SELECT * from users;
idnameagecountry
111Thomas35United Kingdom
222Ben42Australia
333Harry28India

Now let's look at some of the optional parameters that can be passed to the to_sql() function.

  • index : True or False. Default is True.

    If the index paramter in the to_sql function is set to True (index=True) or if the index parameter is omitted then the DataFrame's index is added as the first column in the MySQL database table. If you don't want the index then set index=False

  • index_label : string. Default is None.

    When index=True, the DataFrame's index is included in the table under column name index. You can have a different column name for the index column by setting a string value to the index_label parameter. For example

    df.to_sql('users', engine, index=True, index_label="RowNum")
  • if_exists : fail, replace, append. Default is fail

    This parameter decides what to do if the table already exist in the database. By dafault an error is raised if the table exists. If you set this parameter to replace then the existing table is first dropped before inserting the values. If set to append then the values in the dataframe are appended to the existing table.

  • dtype : dictionary

    The dtype parameter allows you to specify the data type of the table columns as a dictionary object. The keys in the dictionary are column names and values can be one of the SQLAlchemy data types. For example

    from sqlalchemy.types import String, Integer
    df.to_sql('users', engine, dtype={"age": Integer(), "name": String(30)})
    

    The above to_sql() function call creates the users table with the age column set as integer type and name as varchar

    Columns in users table

    index bigint(20) 
    id bigint(20) 
    name varchar(30) 
    age int(11) 
    country text
    

Adding a PRIMARY KEY

The to_sql() function does not offer any options for creating a Primary Key column. However you can achieve this by calling the .execute method of SQLAlchemy engine.

The following statement sets the id column in users table as the primary key.

engine.execute('ALTER TABLE users ADD PRIMARY KEY (`id`);')

Post a comment

Comments

Nothing yet..be the first to share wisdom.