Friday, August 19, 2016
Analytical activities of data users
Analytical activities of data users
Analytical activities of data users
Users may have particular data points of interest within a data set, as opposed to general messaging outlined above. Such low-level user analytic activities are presented in the following table. The taxonomy can also be organized by three poles of activities: retrieving values, finding data points, and arranging data points.
| # | Task | General Description | Pro Forma Abstract | Examples |
|---|---|---|---|---|
| 1 | Retrieve Value | Given a set of specific cases, find attributes of those cases. | What are the values of attributes {X, Y, Z, ...} in the data cases {A, B, C, ...}? | - What is the mileage per gallon of the Audi TT? - How long is the movie Gone with the Wind? |
| 2 | Filter | Given some concrete conditions on attribute values, find data cases satisfying those conditions. | Which data cases satisfy conditions {A, B, C...}? | - What Kelloggs cereals have high fiber? - What comedies have won awards? - Which funds underperformed the SP-500? |
| 3 | Compute Derived Value | Given a set of data cases, compute an aggregate numeric representation of those data cases. | What is the value of aggregation function F over a given set S of data cases? | - What is the average calorie content of Post cereals? - What is the gross income of all stores combined? - How many manufacturers of cars are there? |
| 4 | Find Extremum | Find data cases possessing an extreme value of an attribute over its range within the data set. | What are the top/bottom N data cases with respect to attribute A? | - What is the car with the highest MPG? - What director/film has won the most awards? - What Robin Williams film has the most recent release date? |
| 5 | Sort | Given a set of data cases, rank them according to some ordinal metric. | What is the sorted order of a set S of data cases according to their value of attribute A? | - Order the cars by weight. - Rank the cereals by calories. |
| 6 | Determine Range | Given a set of data cases and an attribute of interest, find the span of values within the set. | What is the range of values of attribute A in a set S of data cases? | - What is the range of film lengths? - What is the range of car horsepowers? - What actresses are in the data set? |
| 7 | Characterize Distribution | Given a set of data cases and a quantitative attribute of interest, characterize the distribution of that attributes values over the set. | What is the distribution of values of attribute A in a set S of data cases? | - What is the distribution of carbohydrates in cereals? - What is the age distribution of shoppers? |
| 8 | Find Anomalies | Identify any anomalies within a given set of data cases with respect to a given relationship or expectation, e.g. statistical outliers. | Which data cases in a set S of data cases have unexpected/exceptional values? | - Are there exceptions to the relationship between horsepower and acceleration? - Are there any outliers in protein? |
| 9 | Cluster | Given a set of data cases, find clusters of similar attribute values. | Which data cases in a set S of data cases are similar in value for attributes {X, Y, Z, }? | - Are there groups of cereals w/ similar fat/calories/sugar? - Is there a cluster of typical film lengths? |
| 10 | Correlate | Given a set of data cases and two attributes, determine useful relationships between the values of those attributes. | What is the correlation between attributes X and Y over a given set S of data cases? | - Is there a correlation between carbohydrates and fat? - Is there a correlation between country of origin and MPG? - Do different genders have a preferred payment method? - Is there a trend of increasing film length over the years? |
Barriers to effective analysis
Barriers to effective analysis may exist among the analysts performing the data analysis or among the audience. Distinguishing fact from opinion, cognitive biases, and innumeracy are all challenges to sound data analysis.
Confusing fact and opinion
You are entitled to your own opinion, but you are not entitled to your own facts.
Daniel Patrick Moynihan
Effective analysis requires obtaining relevant facts to answer questions, support a conclusion or formal opinion, or test hypotheses. Facts by definition are irrefutable, meaning that any person involved in the analysis should be able to agree upon them. For example, in August 2010, the Congressional Budget Office (CBO) estimated that extending the Bush tax cuts of 2001 and 2003 for the 2011-2020 time period would add approximately $3.3 trillion to the national debt.[13] Everyone should be able to agree that indeed this is what CBO reported; they can all examine the report. This makes it a fact. Whether persons agree or disagree with the CBO is their own opinion.
As another example, the auditor of a public company must arrive at a formal opinion on whether financial statements of publicly traded corporations are "fairly stated, in all material respects." This requires extensive analysis of factual data and evidence to support their opinion. When making the leap from facts to opinions, there is always the possibility that the opinion is erroneous.
Cognitive biases
There are a variety of cognitive biases that can adversely effect analysis. For example, confirmation bias is the tendency to search for or interpret information in a way that confirms ones preconceptions. In addition, individuals may discredit information that does not support their views. Analysts may be trained specifically to be aware of these biases and how to overcome them.
Innumeracy
Effective analysts are generally adept with a variety of numerical techniques. However, audiences may not have such literacy with numbers or numeracy; they are said to be innumerate. Persons communicating the data may also be attempting to mislead or misinform, deliberately using bad numerical techniques.
For example, whether a number is rising or falling may not be the key factor. More important may be the number relative to another number, such as the size of government revenue or spending relative to the size of the economy (GDP) or the amount of cost relative to revenue in corporate financial statements. This numerical technique is referred to as normalization[15] or common-sizing. There are many such techniques employed by analysts, whether adjusting for inflation (i.e., comparing real vs. nominal data) or considering population increases, demographics, etc. Analysts apply a variety of techniques to address the various quantitative messages described in the section above.
Analysts may also analyze data under different assumptions or scenarios. For example, when analysts perform financial statement analysis, they will often recast the financial statements under different assumptions to help arrive at an estimate of future cash flow, which they then discount to present value based on some interest rate, to determine the valuation of the company or its stock. Similarly, the CBO analyzes the effects of various policy options on the governments revenue, outlays and deficits, creating alternative future scenarios for key measures.
Other topics
Analytics and business intelligence[
Main article: Analytics
Analytics is the "extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions." It is a subset of business intelligence, which is a set of technologies and processes that use data to understand and analyze business performance.
Education
Analytic activities of data visualization users
In education, most educators have access to a data system for the purpose of analyzing student data.[17] These data systems present data to educators in an over-the-counter data format (embedding labels, supplemental documentation, and a help system and making key package/display and content decisions) to improve the accuracy of educators data analyses.
Go to link download
alternative link download
Labels:
activities,
analytical,
data,
of,
users