Tuesday 8 August 2017

Data Science Tutorial Part 9 Import Data From Files

  1. Read Data from Files
  1. R can import data using many ways.
  2. Packages exists that handles import from software systems like
  1. Download and add package to R
    1. Open the R GUI;
    2. Click on the 'packages' tab;
    3. If it ask country(Select any country)
    4. Choose the package to install;
    5. Load the package into R with the library() function.
      1. library(gdata)
  2. To read data ,the function read.table() is useful ,observe help
  3. Some of the important arguments are
    1. header: Is the first line variable names or not?
    2. sep: What character is used to separate the columns?
    3. dec: What character is used as decimal separator?
    4. nrows: How many rows do we want to read?
    5. na.strings: What string represent a missing value?
    6. skip: How many lines to skip before start reading?
    7. Comment Char: What char in the beginning of a line should indicate that the line should be skipped?
  4. Exercise 1: Read data of .dat file and variables are separated with white space
    1. From labdata open File_white_space.dat and observe the data
    2. Read data by using read.table() function
  5. Exercise 2: Read data of .txt file , variables are separated with white space and skip first 5 rows then read only 2 rows
    1. From labdata open file file_skip_rows.txt and observe data
  6. Exercise 3:Read data of txt file with comment lines ,empty values and decimals available as ,
    1. From labdata open file file_comment.txt and observe data
  7. Read.table() is the main function .however Other functions which are useful for reading data frames from files quickly are
    1. read.csv() (defaults are header=TRUE,sep=”,”,dec=”.”)
    2. read.delim() for tab-delimited files((defaults are header =TRUE,sep="\t")
    3. read.fwf()  fixed width format
    4. read.csv2() (defaults are header =TRUE,sep=";" and dec=",")
    5. read.delim2()
  8. Additional arguments are similar to those of read.table()
  9. The utils package, which is automatically loaded in your R session on startup, can import CSV files with the read.csv() function
  10. scan() can be a little tricky to use, but is very flexible.
  11. Its simplest use is
    1. Observe scan.txt file

  12. readlines() function
    1. observe readlines.txt file

  13. File connections
    1. File connections can open a file for reading different sections in different ways
  14. Read from excel
#read excel files
#download and install perl
library(gdata)
emp1 <- read.xls("scott_emp_data.xlsx")
emp1















  1. Reading Data from SQL Databases
  1. R can connect most of the available relational databases
  2. R users have a few choices of how to connect to their Oracle Database. The most commonly seen include: RJDBC,RODBC and ROracle
  3. Let us see how we can connect to oracle database
  4. To connect to oracle database ,mainly we have four steps
Connecting using RJDBC
  1. Step 1 of 4: Install RJDBC package

  2. Step 2 of 4:Download ojdbc6.jar file and point to r
    1. Download ojdbc6.jar file
      1. Copy and paste into R installation directory
      2. Point to jdbcdriver in r
  1. Step 3 of 4: Create connection to oracle database

  2. Step 4 of 4: Check SQL Commands
    1. dbReadTable: Read a table into a data frame


    1. dbGetQuery: read the result from a SQL statement to a data frame


    2. dbSendUpdate: execute SQL command


    3. dbWriteTable: write a data frame to the schema. It is typically very slow with large tables.




Connecting using RODBC
  1. Step 1 of 2: Create ODBC Connection
    1. Go to run type odbc open Microsoft ODBC Administrator
    2. Click on system DSN add provide as shown below
    3. Test connection close it
  2. Step 2 of 2: access oracle database using RODBC
    1. Add library
    2. Connect database

    3. Query tables

    4. Query columns and properties
    1. Close connection
Connecting using Roracle
  1. Download and install ROracle_1.2-1.zip
install.packages('C:\\Program Files\\R\\ROracle_1.2-1.zip', repos = NULL)
library(ROracle)
  1. Execute all commands and observe results
drv <- dbDriver("Oracle")
con <- dbConnect(drv, "scott", "tiger", dbname='demo.us.oracle.com:1521/orcl')
dbListTables(con)
dbReadTable(con, 'DEPT')
dbGetQuery(con,'select * from dept')
dbDisconnect(con)
For more info read http://127.0.0.1:29881/doc/manual/R-data.html

No comments:

Post a Comment