One of the most popular statistical software tools is R. However, just because it’s popular doesn’t make it the best choice for every problem. Looking at how the software handles large datasets, ease of access to different databases, simplicity of deployment and ease in handling automation are also important. Just something to keep in mind when selecting software.
Along with the abundance of cheap computing power, there has also been a lot of development in statistical and data mining software. Many of these are easy to install, use and relatively inexpensive. Products such as Insight’s S-PLUS, Neuralware’s Predict, R and TANAGRA are but a few. However, they often only capable of one or two techniques and most can’t handle very large datasets. Exceptions being S-PLUS, R, TANAGRA, Weka and JMP.
There are a few functions or tasks that a software package should be able to handle, such as:
- tests on the distribution of variables
- transforming the variables (binning, normalization, etc.)
- detecting the correlation of variables
- factor analysis (PCA and MCA)
- sampling data for testing validity and reliability of models
In addition to the above, the quality of the data mining algorithms should be examined. Check out online reviews of the software to see what experts in the field have to say about the algorithms that were implemented. The last area we should look at is the computing power required and the capacity to handle very large datasets. Large enterprises should keep in mind that computing power is reliant on both the hardware and the software, and the ratios between processing speeds of different programs can be as much as 1 to 5.
This post is only meant as a very brief overview of available software and a starting point for your evaluation. However, there is one recommendation I would like to make. Get some hands on experience with R. It’s easier than you might think (and it’s free). So try it out, install RStudio to get started, don’t worry, you can’t break it… I think.
Here’s a few resources to get you started:
- A free course from HarvardX, Data Science: R Basics. (Tell your friends, “Yeah, I studied at Harvard.”)
- A video: Introduction to R Programming: How to Download, Install and Setup R & RStudio.