While data science covers a wide set of techniques, applications, and disciplines, there are a few associated fields that data science heavily relies on.
The techniques used in the steps of a data science process and in conjunction with the term “data science” are:
Descriptive statistics: Computing mean, standard deviation, correlation, and other descriptive statistics, quantify the aggregate structure of a dataset.
This is essential information for understanding any dataset in order to understand the structure of the data and the relationships within the dataset. They are used in the exploration stage of the data science process.
• Exploratory visualization: The process of expressing data in visual coordinates enables users to find patterns and relationships in the data and to comprehend large datasets. Similar to descriptive statistics, they are integral in the pre- and post-processing steps in data science.
Dimensional slicing: Online analytical processing (OLAP) applications, which are prevalent in organizations, mainly provide information on the data through dimensional slicing, filtering, and pivoting.
OLAP analysis is enabled by a unique database schema design where the data are organized as dimensions (e.g., products, regions, dates) and quantitative facts or measures (e.g., revenue, quantity).
With a well-defined database structure, it is easy to slice the yearly revenue by-products or a combination of region and products.
These techniques are extremely useful and may unveil patterns in data (e.g., candy sales decline after Halloween in the United States).
Hypothesis testing: In confirmatory data analysis, experimental data are collected to evaluate whether a hypothesis has enough evidence to be supported or not.
There are many types of statistical testing and they have a wide variety of business applications (e.g., A/B testing in marketing). In general, data science is a process where many hypotheses are generated and tested based on observational data. Since the data science algorithms are iterative, solutions can be refined in each step.
• Data engineering: Data engineering is the process of sourcing, organizing, assembling, storing, and distributing data for effective analysis and usage.
Database engineering, distributed storage, and computing frameworks (e.g., Apache Hadoop, Spark, Kafka), parallel computing, extraction transformation and loading processing, and data warehousing constitute data engineering techniques. Data engineering helps source and prepare for data science learning algorithms.
Business intelligence: Business intelligence helps organizations consume data effectively. It helps query the Adhoc data without the need to write the technical query command or use dashboards or visualizations to communicate the facts and trends.
Business intelligence specializes in the secure delivery of information to right roles and the distribution of information at scale. Historical trends are usually reported, but in combination with data science, both the past and the predicted future data can be combined. BI can hold and distribute the results of data science.