Saturday, 30 March 2013

Business Application IT Lab--Plotting in R



IT Business Application Lab Assignments#10

Session 10
Date:  26th March,2013



Assignment 1:

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length, bind them together.Create 3 dimensional plots of the same.

Solution:


Step 1: Creating a random data set of 50 items with mean =30 and standard deviation =10

> data <- rnorm(50,mean=30,sd=10)
> data

Step 2:

Taking sample data of length 5 from the created data set in three different vectors x,y,z
> x <- sample(data,5)
> x

> y <- sample(data,5)
> y

> z <- sample(data,5)
> z

Binding the three vectors x,y,z into a vector c using cbind
> c <- cbind(x,y,z)
> c

Output:




Plotting of 3 dimensional graphs:

Command:

plot3d(c[,1:3])

Output:



Plotting of graph with labels for axis and colors

Command:

> plot3d(c[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(500)) 

Output:



Plotting of graph with labels for axis and colors and type "Spheres"

Command;

> plot3d(c[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(500),type="s")

Output:



Plotting of graph with labels for axis and colors and type "Points"

Command:

> plot3d(c[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(500),type="p")

Output:



Plotting of graph with labels for axis and colors and type "Line"

Command:

> plot3d(c[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(500),type="l")

Output:




Assignment 2:


Choose 2 random variables 
Create 3 plots: 
1. X-Y 
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph 
4. Smooth and best fit line for the curve

Solution:

Command:


> x <- rnorm(5000, mean= 20 , sd=10)
> y <- rnorm(5000, mean= 10, sd=10)
> z1 <- sample(letters, 5)
> z2 <- sample(z1, 5000, replace=TRUE)
> z <- as.factor(z2)
> z

Output:



Creating Quickplots

Command:

>qplot(x,y)

Output:


Command:

>qplot(x,z)

Output;


Creating Semi-Transparent plot

Command:


> qplot(x,z, alpha=I(2/10))


Output:


Creating Colored plot

Command:

> qplot(x,y, color=z)

Output:


Creating Logarithmic Color plot

Command:

> qplot(log(x),log(y), color=z)

Output:


Best fit and smooth curve using "geom"

Command:

> qplot(x,y,geom=c("path","smooth"))

Output:


Command:

> qplot(x,y,geom=c("point","smooth"))

Output:


Command:

> qplot(x,y,geom=c("boxplot","jitter"))

Output:





Saturday, 23 March 2013

QlikView-A Data Visualization Tool




Session 9
Date:  19th March,2013


An infographics/data visualization tool that I have studied and found highly sophisticated yet user-friendly is QlikView.

The QlikView Business Discovery platform delivers true self-service BI that empowers business users by driving innovative decision-making.


Features:

This is one of the most practiced data visualization tool which enables the user to
  • Consolidating relevant data from multiple sources into a single application
  • Exploring the associations in the data
  • Enabling social decision making through secure, real-time collaboration
  • Visualizing data with engaging, state-of-the-art graphics
  • Searching across all data—directly and indirectly
  • Interacting with dynamic apps, dashboards and analytics
  • Accessing, analyzing and capturing data from mobile devices

The QlikView Difference over others
  • Has an inference engine that maintains the associations in the data automatically
  • Calculates aggregations on the fly, as needed, for a super-fast user experience
  • Compresses data down to 10% of its original size to optimize the power of the processors
  • Accomplishes both within a single, comprehensive product

Go to http://ap.demo.qlikview.com/download/.

Install the application with valid credential.

The home screen looks like:



Choose any supported file.

I have chose an excel containing few NIFTY historical data as follows:

Date Open High Low Close Shares Traded Turnover (Rs. Cr)
1-Oct-12 5704.75 5722.95 5694 5718.8 123138510 4798.17
3-Oct-12 5727.7 5743.25 5715.8 5731.25 165037864 6654.02
4-Oct-12 5751.55 5807.25 5751.35 5787.6 171404290 6954.74
5-Oct-12 5815 5815.35 4888.2 5746.95 255569804 12995.8
8-Oct-12 5751.85 5751.85 5666.2 5676 142319000 5853.56
9-Oct-12 5708.15 5728.65 5677.9 5704.6 119300415 5047.01
10-Oct-12 5671.15 5686.5 5647.05 5652.15 126294361 4564.39

After loading the data there are several types of visualization options avalible like
Bar chart
Line chart
Combo chart
Scatter chart
Grid chart
Straight Table
Pivot Table

I made use of some of the above mentioned charts to came out some observations:

Fig 1:


Fig 2:



Fig 3:




Some of the features where QlikView lacks are
  •  Qlikview works perfectly when the size of the database is small but in practical cases the database is never small.
  •  Alerts- Capability to create alerts and delivers it to not only Email but blackberries, hand held devices, mobile phones etc
  •  Multiuser development environment- This feature allows multiple developers work on a single project and the utility synchronizes the pieces of project each developer is working with the main project. Qlikview completely lacks this feature.
  • Connect and extract data from multidimensional objects.
  • Support for advance features like embedded browser(available in Hyperion Interactive reporting), flickers(rolling messages) etc as an standard options.

Friday, 15 March 2013

Business Application IT Lab


IT Business Application Lab Assignments#8

Session 8
Date:  12th March,2013



The data set we have used in this assignment is "Produc".

The description for the same follows

- state : the state
- year : the year
- pcap: private capital stock
- hwy : highway and streets
- pc: public capital
- gsp: gross state products
- emp: labor input measured by the employment in non–agricultural payrolls
- unemp: state unemployment rate



Assignment :
To calculate the values for all the 3 models and decide which models best fits the data set for panel estimation ?


Solution:

Calculating value for Pooling Model




Calculating value for Fixed Model



Calculating value for Random Model




To choose the best model that fits the data set "Produc" ,we need to run pairwise hypothesis tests among the 3 models and select the best fit in the end.


Test 1:


Between pooling and fixed model

Command :
pFtest (fixed1 , pooled)




Test details :
H0: Null: the individual index and time based params are all zero
H1 : Atleast one of the index and time based params are non zero

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.
So we can reject the null hypothesis.

Hence Fixed model is better than the pooling model.



Test2:
Between pooling and random model

Command :
plmtest (pooled)





Test details :
H0: Null: the individual index and time based params are all zero : Pooling Model
H1: Atleast one of the index and time based params are non zero : Random Model

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low..
So we can reject the Null hypothesis.

Hence random model is better than the pooling model.



Test3:
Between fixed and random model

Command :
We use Hausman test -:
phtest(random1 , fixed1)




Test details :
H0: Null: individual effects are not correlated with any regressor : Random Model
H1 : Individual effects are correlated : Fixed Model

The hypothesis test suggests that the one of the models is inconsistent.
As the p-value is too low.
So we can reject the null hypothesis.

Hence fixed model is better than random model.



Conclusion :-
We can conclude that fixed model best fits the "Produc" data set panel data estimations. i.e there is significant correlation observed with the regressor variables and index impact exists.
Hence, we would choose "Fixed" model to estimate the panel data presented by "Produc" data set.