Tuesday 8 August 2017

Data Science Tutorial Part 9 Import Data From Files

  1. Read Data from Files
  1. R can import data using many ways.
  2. Packages exists that handles import from software systems like
  1. Download and add package to R
    1. Open the R GUI;
    2. Click on the 'packages' tab;
    3. If it ask country(Select any country)
    4. Choose the package to install;
    5. Load the package into R with the library() function.
      1. library(gdata)
  2. To read data ,the function read.table() is useful ,observe help
  3. Some of the important arguments are
    1. header: Is the first line variable names or not?
    2. sep: What character is used to separate the columns?
    3. dec: What character is used as decimal separator?
    4. nrows: How many rows do we want to read?
    5. na.strings: What string represent a missing value?
    6. skip: How many lines to skip before start reading?
    7. Comment Char: What char in the beginning of a line should indicate that the line should be skipped?
  4. Exercise 1: Read data of .dat file and variables are separated with white space
    1. From labdata open File_white_space.dat and observe the data
    2. Read data by using read.table() function
  5. Exercise 2: Read data of .txt file , variables are separated with white space and skip first 5 rows then read only 2 rows
    1. From labdata open file file_skip_rows.txt and observe data
  6. Exercise 3:Read data of txt file with comment lines ,empty values and decimals available as ,
    1. From labdata open file file_comment.txt and observe data
  7. Read.table() is the main function .however Other functions which are useful for reading data frames from files quickly are
    1. read.csv() (defaults are header=TRUE,sep=”,”,dec=”.”)
    2. read.delim() for tab-delimited files((defaults are header =TRUE,sep="\t")
    3. read.fwf()  fixed width format
    4. read.csv2() (defaults are header =TRUE,sep=";" and dec=",")
    5. read.delim2()
  8. Additional arguments are similar to those of read.table()
  9. The utils package, which is automatically loaded in your R session on startup, can import CSV files with the read.csv() function
  10. scan() can be a little tricky to use, but is very flexible.
  11. Its simplest use is
    1. Observe scan.txt file

  12. readlines() function
    1. observe readlines.txt file

  13. File connections
    1. File connections can open a file for reading different sections in different ways
  14. Read from excel
#read excel files
#download and install perl
library(gdata)
emp1 <- read.xls("scott_emp_data.xlsx")
emp1















  1. Reading Data from SQL Databases
  1. R can connect most of the available relational databases
  2. R users have a few choices of how to connect to their Oracle Database. The most commonly seen include: RJDBC,RODBC and ROracle
  3. Let us see how we can connect to oracle database
  4. To connect to oracle database ,mainly we have four steps
Connecting using RJDBC
  1. Step 1 of 4: Install RJDBC package

  2. Step 2 of 4:Download ojdbc6.jar file and point to r
    1. Download ojdbc6.jar file
      1. Copy and paste into R installation directory
      2. Point to jdbcdriver in r
  1. Step 3 of 4: Create connection to oracle database

  2. Step 4 of 4: Check SQL Commands
    1. dbReadTable: Read a table into a data frame


    1. dbGetQuery: read the result from a SQL statement to a data frame


    2. dbSendUpdate: execute SQL command


    3. dbWriteTable: write a data frame to the schema. It is typically very slow with large tables.




Connecting using RODBC
  1. Step 1 of 2: Create ODBC Connection
    1. Go to run type odbc open Microsoft ODBC Administrator
    2. Click on system DSN add provide as shown below
    3. Test connection close it
  2. Step 2 of 2: access oracle database using RODBC
    1. Add library
    2. Connect database

    3. Query tables

    4. Query columns and properties
    1. Close connection
Connecting using Roracle
  1. Download and install ROracle_1.2-1.zip
install.packages('C:\\Program Files\\R\\ROracle_1.2-1.zip', repos = NULL)
library(ROracle)
  1. Execute all commands and observe results
drv <- dbDriver("Oracle")
con <- dbConnect(drv, "scott", "tiger", dbname='demo.us.oracle.com:1521/orcl')
dbListTables(con)
dbReadTable(con, 'DEPT')
dbGetQuery(con,'select * from dept')
dbDisconnect(con)
For more info read http://127.0.0.1:29881/doc/manual/R-data.html

Data Science Tutorial Part 8 List

  1. Lists
  1. A list in R is similar to your to-do list at work or school: the different items on that list most likely differ in length, characteristic, type of activity that has to do be done.
  2. A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way.
  3. You could say that a list is some kind super data type: you can store practically any piece of information in it!
  4. The list of elements can be given names and they can be accessed using these names
    1. Method 1: Named List

    2. Method 2: Named List
  5. Nested List: Create list inside a list
  6. Single bracket [ versus double bracket [[
    1. List is collection of vectors
    2. In above list we have 4 vectors
    3. First vector is
    4. Get first vector of list
    5. Get first element of first vector
    6. Get second element of first vector? What you observed?
    7. Get 4 vector 2 element     or
  7. Select elements using names
  8. Selecting elements using logicals (TRUE,FALSE) .Possible only with single brackets
  9. Extending list example 1
  10. Extending list example 2
  11. Add /remove/update list
    Create simple list  
Add element  
Delete element  
update element  
  1. Merge two lists
  2. Convert list to vector
Refer for more info https://cran.r-project.org/doc/manuals/R-intro.html#Lists or http://www.r-tutor.com/r-introduction/list