DebarghyaC: March 2013

Saturday, 30 March 2013

Business Application IT Lab--Plotting in R

IT Business Application Lab Assignments#10

Session 10
Date: 26th March,2013

Assignment 1:

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length, bind them together.Create 3 dimensional plots of the same.

Solution:

Step 1: Creating a random data set of 50 items with mean =30 and standard deviation =10

> data <- rnorm(50,mean=30,sd=10)
> data

Step 2:

Taking sample data of length 5 from the created data set in three different vectors x,y,z
> x <- sample(data,5)
> x

> y <- sample(data,5)
> y

> z <- sample(data,5)
> z

Binding the three vectors x,y,z into a vector c using cbind
> c <- cbind(x,y,z)
> c

Output:

Plotting of 3 dimensional graphs:

Command:

plot3d(c[,1:3])

Output:

Plotting of graph with labels for axis and colors

Command:

> plot3d(c[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(500))

Output:

Plotting of graph with labels for axis and colors and type "Spheres"

Command;

> plot3d(c[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(500),type="s")

Output:

Plotting of graph with labels for axis and colors and type "Points"

Command:

> plot3d(c[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(500),type="p")

Output:

Plotting of graph with labels for axis and colors and type "Line"

Command:

> plot3d(c[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(500),type="l")

Output:

Assignment 2:

Choose 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph
4. Smooth and best fit line for the curve

Solution:

Command:

> x <- rnorm(5000, mean= 20 , sd=10)
> y <- rnorm(5000, mean= 10, sd=10)
> z1 <- sample(letters, 5)
> z2 <- sample(z1, 5000, replace=TRUE)
> z <- as.factor(z2)
> z

Output:

Creating Quickplots

Command:

>qplot(x,y)

Output:

Command:

>qplot(x,z)

Output;

Creating Semi-Transparent plot

Command:

> qplot(x,z, alpha=I(2/10))

Output:

Creating Colored plot

Command:

> qplot(x,y, color=z)

Output:

Creating Logarithmic Color plot

Command:

> qplot(log(x),log(y), color=z)

Output:

Best fit and smooth curve using "geom"

Command:

> qplot(x,y,geom=c("path","smooth"))

Output:

Command:

> qplot(x,y,geom=c("point","smooth"))

Output:

Command:

> qplot(x,y,geom=c("boxplot","jitter"))

Output:

Saturday, 23 March 2013

QlikView-A Data Visualization Tool

Session 9

Date: 19th March,2013

An infographics/data visualization tool that I have studied and found highly sophisticated yet user-friendly is QlikView.

The QlikView Business Discovery platform delivers true self-service BI that empowers business users by driving innovative decision-making.

Features:

This is one of the most practiced data visualization tool which enables the user to

Consolidating relevant data from multiple sources into a single application
Exploring the associations in the data
Enabling social decision making through secure, real-time collaboration
Visualizing data with engaging, state-of-the-art graphics
Searching across all data—directly and indirectly
Interacting with dynamic apps, dashboards and analytics
Accessing, analyzing and capturing data from mobile devices

The QlikView Difference over others

Has an inference engine that maintains the associations in the data automatically
Calculates aggregations on the fly, as needed, for a super-fast user experience
Compresses data down to 10% of its original size to optimize the power of the processors
Accomplishes both within a single, comprehensive product

Go to http://ap.demo.qlikview.com/download/.

Install the application with valid credential.

The home screen looks like:

Choose any supported file.

I have chose an excel containing few NIFTY historical data as follows:

Date	Open	High	Low	Close	Shares Traded	Turnover (Rs. Cr)
1-Oct-12	5704.75	5722.95	5694	5718.8	123138510	4798.17
3-Oct-12	5727.7	5743.25	5715.8	5731.25	165037864	6654.02
4-Oct-12	5751.55	5807.25	5751.35	5787.6	171404290	6954.74
5-Oct-12	5815	5815.35	4888.2	5746.95	255569804	12995.8
8-Oct-12	5751.85	5751.85	5666.2	5676	142319000	5853.56
9-Oct-12	5708.15	5728.65	5677.9	5704.6	119300415	5047.01
10-Oct-12	5671.15	5686.5	5647.05	5652.15	126294361	4564.39

After loading the data there are several types of visualization options avalible like

Bar chart

Line chart

Combo chart

Scatter chart

Grid chart

Straight Table

Pivot Table

I made use of some of the above mentioned charts to came out some observations:

Fig 1:

Fig 2:

Fig 3:

Some of the features where QlikView lacks are

Qlikview works perfectly when the size of the database is small but in practical cases the database is never small.
Alerts- Capability to create alerts and delivers it to not only Email but blackberries, hand held devices, mobile phones etc
Multiuser development environment- This feature allows multiple developers work on a single project and the utility synchronizes the pieces of project each developer is working with the main project. Qlikview completely lacks this feature.
Connect and extract data from multidimensional objects.
Support for advance features like embedded browser(available in Hyperion Interactive reporting), flickers(rolling messages) etc as an standard options.

Friday, 15 March 2013

Business Application IT Lab

IT Business Application Lab Assignments#8

Session 8
Date: 12th March,2013

The data set we have used in this assignment is "Produc".

The description for the same follows

- state : the state
- year : the year
- pcap: private capital stock
- hwy : highway and streets
- pc: public capital
- gsp: gross state products
- emp: labor input measured by the employment in non–agricultural payrolls
- unemp: state unemployment rate

Assignment :

To calculate the values for all the 3 models and decide which models best fits the data set for panel estimation ?

Solution:

Calculating value for Pooling Model

Calculating value for Fixed Model

Calculating value for Random Model

To choose the best model that fits the data set "Produc" ,we need to run pairwise hypothesis tests among the 3 models and select the best fit in the end.

Test 1:

Between pooling and fixed model

Command :
pFtest (fixed1 , pooled)

Test details :
H0: Null: the individual index and time based params are all zero
H1 : Atleast one of the index and time based params are non zero

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.
So we can reject the null hypothesis.

Hence Fixed model is better than the pooling model.

Test2:
Between pooling and random model

Command :
plmtest (pooled)

Test details :
H0: Null: the individual index and time based params are all zero : Pooling Model
H1: Atleast one of the index and time based params are non zero : Random Model

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low..
So we can reject the Null hypothesis.

Hence random model is better than the pooling model.

Test3:
Between fixed and random model

Command :
We use Hausman test -:
phtest(random1 , fixed1)

Test details :
H0: Null: individual effects are not correlated with any regressor : Random Model
H1 : Individual effects are correlated : Fixed Model

The hypothesis test suggests that the one of the models is inconsistent.
As the p-value is too low.
So we can reject the null hypothesis.

Hence fixed model is better than random model.

Conclusion :-
We can conclude that fixed model best fits the "Produc" data set panel data estimations. i.e there is significant correlation observed with the regressor variables and index impact exists.
Hence, we would choose "Fixed" model to estimate the panel data presented by "Produc" data set.