Why do you need to connect R to MySQL ? and how to do it ?

Othmane Hajji
4 min readApr 26, 2020

You probably know that MySQL is among the most popular databases software, It’s been widely used by industry leaders such as Facebook, Linkedin, Google. In the other hand, the R programming language is among the best data mining and data analysis softwares with a very large community. I will show via this tutorial how to create an efficient interaction between R and MySQL.

So, why do you need to connect R to MySQL ?

There are three main reasons why you should interact R and MySQL :

  • R cannot handle a very large amount of data alone. An inteaction with MySQL will allow an efficient use and storage of data and avoid storing databases into data.frame format or CSV files for an optimal memory use.
  • MySQL is a database management software. By consequent, it cannot afford advanced analytics that R could easily perform. R may be used for all advanced techniques you can imagine but you should think at storing the outputs into MySQL.
  • When it comes to data storage, you should focus on keeping your data safe and secure and avoid the risk to lost irreversibly your excel files.

For all this considerations, I will propose you to follow the next process :

R and MySQL interaction process (Made by Othmane HAJJI)

… How it works ?

First of all, you need to install MySQL and R programming language, both are open source and easily downloadable. There a plethora of resources that show you how to install and get started with both softwares, you will find some useful links at the end of this tutorial.

  • Connect R to MySQL and get your data

Now, we will get into the practice, we will use the RMySQL library that we install and load. By next, we use the dbConnect() function to connect R to MySQL by specifying the host, user, the name of object and password if any.

In order, to list the differents tables present in the “world” object, we use dbListTables() function:

To list the fields of a given table, we use the dbListFields() function, we explore the variables present in the country table:

Now, how to get the read the MySQL database into R so you can explore it. There are two main ways to do so :

  1. Use dbReadTable(object, table) by specifying the object and the table.
  2. Use the dbGetQuery(Object, SQL statement).

We will put things into practice

We obtain a data.frame object into R:

As you can see from the R output above, the Country_world table contains economics and demographics information about world countries. We will use these variables to cluster the european countries.

For this purpose, we must filter the Country_world to subset european countries, can do this by either using the filter() function from the dplyr library in R or directly using the dbGetQuery() and with where condition in the statement option.

We obtain a data.frame object of 46 european countries:

  • Do some analytics in R

In this tutorial, we will perform a clustering technique to segment the european countries function of economic and demographic factors. That would be very difficult and laborious to use clustering techniques directly in MySQL and here appears the complementarity between R and MySQL.

We will use the Hierarchical Clustering technique to segment the european countries into 5 clusters based on the GNP, Old GNP and life expectancy variables.

Reminder : The Hierarchical Clustering is an algorithm that gather similar objects into one cluster using a distance measure between the datapoints in a manner that maximise the distance between the clusters.

We obtain the graphic visualizations of the european countries under the clustering components.

Each single country is affected to a cluster as follow:

  • Send the Table to MySQL

In order to send this new table to MySQL, we will use the dbSendQuery() function. First, we create the table that we will receive the clusters data and then insert the values of the clusters. We start by inserting the values of the “Albania” country.

In MySQL, the clusters new table appears and contains one row.

We can simultaneously insert all the clusters table rows.

THANK YOU FOR YOUR TIME !

--

--