Data analytics is changing how organizations define workforce management. Given the size and diversity of a contractor’s workforce, as well as such inherent challenges within the industry as recruiting and retention; company perception; employee productivity; fraud/ corruption; or harassment, favoritism, and other discriminatory practices, data analytics has become critical to effective risk management.
Using Technology to Access & Analyze Your Data
Managing workforce risks with analytics is becoming increasingly complex in the digital age. To successfully address these risks, contractors must not only understand how to leverage multiple types of data, but also how to apply the available technologies: traditional analytics, machine learning, and textual analytics.
Traditional Analytics
Analytics is defined as the methods and procedures used to extract useful information from a data set in order to answer a strategic question. Traditional analytics rely on rule-based methods that follow simple Boolean logic (a search that limits results with the keywords “and,” “or,” and “not”) to detect anomalies. For example: If a subcontractor’s address matches an employee’s address and its wire transfer account matches the employee bank account, then an improper relationship likely exists.
While effective, rule-based systems are inherently subjective, geared toward known questions, and limited to a few attributes and exact matching of criteria.
Machine Learning
Machine learning is a useful type of artificial intelligence that can learn without pre-defined rules, supplementing traditional decision-making processes with enhanced technologies. One such type of machine learning, known as supervised learning, constructs a decision tree based on meta-tagged data (e.g., “red flag” or “not”) to determine how red flag transactions are related. Machine learning applies the learned logic to new data and has the ability to learn from a complex array of data rather than just a few variables, which leads to greater accuracy in analyzing a business problem.
Another type of machine learning, called unsupervised learning, constructs decision trees without meta-tagged data; it identifies patterns of interest and anomalies using its own decision-making criteria. This allows users to find patterns in data sets not previously identified or codified into rule-based methods. Both supervised and unsupervised machine learning systems are self-refining; that is, accuracy improves as more data is encountered.
Applications in Structured Data
Machine learning is frequently used to spot red flag patterns in structured data – which exists in columns and rows, is commonly found in tabular format (i.e., typically found in a database), and is critical for managing risk and quantifying exposure. Examples include identifying suspicious change requests, unusual banking transactions, and credit card activity.
It is also useful in network relationship analysis, which is the exploration of connections between individuals and/or entities. These complex relationship networks can be quantified with an unsupervised learning approach called “clustering,” which allows the user to efficiently identify key relationships, both known and previously unknown.
In an industry like construction, relationships are vital to success, but also easy to exploit. Further, internal relationships may lead to distrust and favoritism in the workforce. The source of such data is often corporate e-mail, but may also include text messages, instant messages, and social media.
Machine learning also enhances basic attribute matching. Rather than creating a complex set of rules for matching names, addresses, and other identifying attributes, machine learning-based systems learn what a match looks like and applies this logic to the data, resulting in better accuracy.
Applications in Unstructured Data
Machine learning also applies to unstructured data, a relatively untapped set of data in most organizations. Unstructured data – such as e-mail, social media, instant messages, geolocation information, data generated from the Internet of Things, and other nontraditional data sets – is not in a tabular format (i.e., not housed in a database).