Packages For Data Manipulation
Must know Packages for a successful Data Scientist
Packages for Data Manipulation
XLSX: To read and write excel files
Foreign: To read and write SAS,SPSS files
XML: To read and write XML File
JSON: To read and write Json files
Moments: To Find Skewness and Kurtosis
Httr: A set of useful tools for working with http connections
ggplot2: For visualixation purpose
lubridate: To work with date-spans, time-spans, date-time dd/mm/yy to yy/mm/dd
dplyr: Consistent and fast tool for working on R and modify the Data
Packages for Imputation
HotDeckimputation: To resolve missing Data
Yalmpute: Performs nearest neighbour-based imputation using one or more alternative approaches to process multivariate data
Mvnmle: Finds the maximum likelihood estimate of the mean vector and variance-covariance matrix for multivariate normal data with missing values.
Mice: Multiple Imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm
Lattice: A powerful, high-level data visualization system, emphasis on multivariate data. Sufficient for typical graphics needs, flexible to handle non-standard requirements.
Packages for Kmeans
Plyr: break a big problem down into pieces, operate on each piece and then put all the pieces back together.
Animation: Provides functions for animations in probability theory, mathematical, multivariate, nonparametric, computational statistics, sampling survey, linear models, time series, np data mining and machine learning.
kselection : selection of number of clusters via bootstrap
Doparallel : provides a parallel backend for the proper %dopar% function using the parallel package
Cluster : Finding groups in data
Package for KNN:
Class : various functions for classification, including k nearest neighbour , learning vector quantization self-organizing maps
Gmodels: various R programming tools for model fitting
Package for linear regression :
Lattice : A powerful high level data visualisation system emphasis on multivariate data. sufficient for typical graphics needs, flexible to handle most non-standards requirements
Car : function and database to accompany
Cor2poor : used to find partial correlation
MASS : function and database to support “ modern applied statistics with s”
Package for Naive Bayes:
e1071: functions for latent class analysis, fuzzy clustering . short time fourier transform , support vector machine, shortest path computation, bagged clustering , naive bayes classifier
gmodels : various programming tools for ,model fitting .
Packages for Text mining
rjava: Low-level interface to java Vm similar to .c/.call. This allows creation of objects, calling methods and accessing fields.
tm : This is a framework. for text mining applications within R
Snowballc: Collapsing words to a common word to understand vocabulary. currently supportIng Danish, Dutch, English, Finnish, French, UMW, Flunganan, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish languages.
Wordcloud : Describing words in a beautiful way.
Rweka: collection of machine learning algorithms for data mining tasks written in java, containing teals for data pre-processing, Visualization, association rules, classification, regression and Clustering.
igraph: Routines for simple graph and network analysis. Handling large graphs and providing functions for generating random and regular graphs, graph visualization, centralitymethod.
qdap: Automates many of the tasks associated with quantitative discourse analysis of transcripts, parsing tools for preparing transcript data.
Maptpx: Posterior maximizatIon for topic models (LDA) In text analysis.
Packages for SVM/Neural:
KernIab: Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. ‘KernIab,' includes Support Vector Machines,Spectral Clustering, KernIab PCA, Gaussian Process and OP solver .
Neuralnet : Training of neural networks using backpropagation, resilient backpropagation, resilient backpropagation allows flexible settings through custom-choice of error and action function.
Packages for Twitter:
TwitterR : It provides an interface to the Twitter web API.
Base64enc: It provides tools for handling base64 encoding. This is more flexible than the orphaned base64. Pacbge.
Httpuv: It provides protocol support for handling HTTP and WebSocket requests directly from R. It Is a building block for other packages.