Data Analyst

Data Analyst Work Role ID: 422 (NIST: OM-DA-002) Category/Specialty Area: Operate & Maintain / Data Administration Workforce Element: IT (Cyberspace)

Examines data from multiple disparate sources with the goal of providing new insight. Designs and implements custom algorithms, flow processes and layouts for complex, enterprise-scale data sets used for modeling, data mining, and research purposes.


Items denoted by a * are CORE KSATs for every Work Role, while other CORE KSATs vary by Work Role.

Core KSATs

KSAT ID Description KSAT
22

* Knowledge of computer networking concepts and protocols, and network security methodologies.

Knowledge
108

* Knowledge of risk management processes (e.g., methods for assessing and mitigating risk).

Knowledge
166

Skill in conducting queries and developing algorithms to analyze data structures.

Skill
201

Skill in generating queries and reports.

Skill
1120

Ability to interpret and incorporate data from multiple tool sources.

Ability
1157

* Knowledge of national and international laws, regulations, policies, and ethics as they relate to cybersecurity.

Knowledge
1158

* Knowledge of cybersecurity principles.

Knowledge
1159

* Knowledge of cyber threats and vulnerabilities.

Knowledge
6900

* Knowledge of specific operational impacts of cybersecurity lapses.

Knowledge

Additional KSATs

KSAT ID Description KSAT
21

Knowledge of computer algorithms.

Knowledge
23

Knowledge of computer programming principles such as object-oriented design.

Knowledge
28

Knowledge of data administration and data standardization policies and standards.

Knowledge
31

Knowledge of data mining and data warehousing principles.

Knowledge
32

Knowledge of database management systems, query languages, table relationships, and views.

Knowledge
35

Knowledge of digital rights management.

Knowledge
44

Knowledge of enterprise messaging systems and associated software.

Knowledge
65A

Knowledge of Information Theory (e.g., source coding, channel coding, algorithm complexity theory, and data compression).

Knowledge
74

Knowledge of low-level computer languages (e.g., assembly languages).

Knowledge
75A

Knowledge of mathematics, including logarithms, trigonometry, linear algebra, calculus, statistics, and operational analysis.

Knowledge
79

Knowledge of network access, identity, and access management (e.g., public key infrastructure [PKI]).

Knowledge
90

Knowledge of operating systems.

Knowledge
98

Knowledge of policy-based and risk adaptive access controls.

Knowledge
102

Knowledge of programming language structures and logic.

Knowledge
104

Knowledge of query languages such as SQL (structured query language).

Knowledge
120

Knowledge of sources, characteristics, and uses of the organization’s data assets.

Knowledge
135

Knowledge of the capabilities and functionality associated with various technologies for organizing and managing information (e.g., databases, bookmarking engines).

Knowledge
172

Skill in creating and utilizing mathematical or statistical models.

Skill
186

Skill in developing data dictionaries.

Skill
187

Skill in developing data models.

Skill
224A

Skill in the use of design modeling (e.g., unified modeling language).

Skill
238A

Skill in writing code in a currently supported programming language (e.g., Java, C++).

Skill
342

Knowledge of Unix command line (e.g., mkdir, mv, ls, passwd, grep).

Knowledge
400

Analyze and define data requirements and specifications.

Task
401

Analyze and plan for anticipated changes in data capacity requirements.

Task
520B

Develop and implement data mining and data warehousing programs.

Task
529

Develop data standards, policies, and procedures.

Task
702

Manage the compilation, cataloging, caching, distribution, and retrieval of data.

Task
796

Provide a managed flow of relevant information (via web-based portals or other means) based on a mission requirements.

Task
815

Provide recommendations on new database technologies and architectures.

Task
904

Knowledge of interpreted and compiled computer languages.

Knowledge
905

Knowledge of secure coding techniques.

Knowledge
910

Knowledge of database theory.

Knowledge
1088

Skill in using binary analysis tools (e.g., Hexedit, command code xxd, hexdump).

Skill
1091

Skill in one way hash functions (e.g., Secure Hash Algorithm [SHA], Message Digest Algorithm [MD5]).

Skill
1115

Skill in reading Hexadecimal data.

Skill
1116

Skill in identifying common encoding techniques (e.g., Exclusive Disjunction [XOR], American Standard Code for Information Interchange [ASCII], Unicode, Base64, Uuencode, Uniform Resource Locator [URL] encode).

Skill
1124

Knowledge of advanced data remediation security features in databases.

Knowledge
1128

Knowledge of Java-based database access application programming interface (API) (e.g., Java Database Connectivity [JDBC]).

Knowledge
3722

Skill in data mining techniques (e.g., searching file systems) and analysis.

Skill
5030

Analyze data sources to provide actionable recommendations.

Task
5080

Assess the validity of source data and subsequent findings.

Task
5100

Collect metrics and trending data.

Task
5120

Conduct hypothesis testing using statistical processes.

Task
5140

Confer with systems analysts, engineers, programmers and others to design application.

Task
5220

Develop and facilitate data-gathering methods.

Task
5270

Develop strategic insights from large data sets.

Task
5430

Present technical information to technical and non-technical audiences.

Task
5440

Present data in creative formats.

Task
5550

Program custom algorithms.

Task
5570

Provide actionable recommendations to critical stakeholders based on data analysis and findings.

Task
5640

Utilize technical documentation or resources to implement a new mathematical, data science, or computer science method.

Task
6050

Ability to build complex data structures and high-level programming languages.

Ability
6120

Ability to dissect a problem and examine the interrelationships between data that may appear unrelated.

Ability
6130

Ability to identify basic common coding flaws at a high level.

Ability
6180

Ability to use data visualization tools (e.g., Flare, HighCharts, AmCharts, D3.js, Processing, Google Visualization API, Tableau, Raphael.js).

Ability
6190

Effectively allocate storage capacity in the design of data management systems.

Task
6200

Knowledge of applications that can log errors, exceptions, and application faults and logging.

Knowledge
6300

Knowledge of how to utilize Hadoop, Java, Python, SQL, Hive, and PIG to explore data.

Knowledge
6311

Knowledge of machine learning theory and principles.

Knowledge
6470

Read, interpret, write, modify, and execute simple scripts (e.g., PERL, VBS) on Windows and UNIX systems (e.g., those that perform tasks such as: parsing large data files, automating manual tasks, and fetching/processing remote data).

Task
6490

Skill in assessing the predictive power and subsequent generalizability of a model.

Skill
6520

Skill in data pre-processing (e.g., imputation, dimensionality reduction, normalization, transformation, extraction, filtering, smoothing).

Skill
6570

Skill in identifying hidden patterns or relationships.

Skill
6610

Skill in performing format conversions to create a standard representation of the data.

Skill
6620

Skill in performing sensitivity analysis.

Skill
6650

Skill in developing machine understandable semantic ontologies.

Skill
6651

Skill in Regression Analysis (e.g., Hierarchical Stepwise, Generalized Linear Model, Ordinary Least Squares, Tree-Based Methods, Logistic).

Skill
6690

Skill in transformation analytics (e.g., aggregation, enrichment, processing).

Skill
6710

Skill in using basic descriptive statistics and techniques (e.g., normality, model distribution, scatter plots).

Skill
6720

Skill in using data analysis tools (e.g., Excel, STATA SAS, SPSS).

Skill
6730

Skill in using data mapping tools.

Skill
6750

Skill in using outlier identification and removal techniques.

Skill
6760

Skill in writing scripts using R, Python, PIG, HIVE, SQL, etc.

Skill
6780

Utilize different programming languages to write code, open files, read files, and write output to different files.

Task
6790

Utilize opens source language such as R and apply quantitative techniques (e.g., descriptive and inferential statistics, sampling, experimental design, parametric and non-parametric tests of difference, ordinary least squares regression, general line).

Task