EPI-USE MENDIX | Articles

RAD AI – Ignite | Episode 2: Identify and Address Anomalies

Written by Matthew Daniels | May 28, 2024 2:00:00 PM

This is the second installment of RAD AI – Ignite, a three-part series on use cases to begin realizing the value of AI/ML technologies in your organization. These use cases illustrate the speed and power of combining Amazon Web Services (AWS) managed AI/ML services and rapid application development with Mendix.

 



As organizations grow and expand offerings to their customer bases, there is a natural need to identify and address anomalies. A primary concern here, and what we’ll focus on in this article, is fraud. Two common areas of fraud are during new account creation (e.g. “junk” or unauthorized accounts) and during the processing of a payment. These two types of fraud cost organizations billions of dollars annually and introduce security risks to your IT ecosystem. As a result, fraud prevention has, or will, become a priority – even for smaller organizations.

Existing and common fraud management software can be costly, time-consuming, and inflexible. As the fraud landscape develops, these solutions are challenging to adapt to emerging threats. This is exacerbated by the fact that the build, train and deploy lifecycle can take months in cases. Even with updated models, there is limited ability to design new workflows and processes to support the investigation into novel fraud. As the fraud landscape develops, it’s natural that the desired workflows to address these cases would also develop over time.

Existing and common fraud management software can be costly, time-consuming, and inflexible. As the fraud landscape develops, these solutions are challenging to adapt to emerging threats. This is exacerbated by the fact that the build, train and deploy lifecycle can take months in cases. Even with updated models, there is limited ability to design new workflows and processes to support the investigation into novel fraud. As the fraud landscape develops, it’s natural that the desired workflows to address these cases would also develop over time.

In this article, we’ll look at the Amazon Fraud Detector managed service integrated into a Mendix-built application. This combination provides us with the flexibility on both the model and workflow-side to address fraud in a proactive manner. We can rapidly iterate the fraud models as well as the supporting workflows without extensive investment into machine learning and application development teams.

AWS Service: Amazon Fraud Detector

First, let’s go over a brief overview of the Amazon Fraud Detector (“AFD”) service. This is a fully managed service that leverages machine learning to identify anomalies in data. This service is specially tailored for several fraud use cases including false account registration and transaction fraud that we will highlight in this demo. At its core, the AFD service is creating predictive algorithms to identify anomalies based on the data that you provide it with. This ensures that the models are customized to your organization and scenarios you would like to manage.

AFD is trained on the customer’s own data and comes in two major flavors:

Online Fraud Insights (OFI):

Handles independent events well, like guest checkout and registration fraud where a “profile” for an entity cannot be created

Transaction Fraud Insights (TFI):

Handles events attached to entities well, where a historic profile of activity is created and maintained for an entity

Since the AFD service is fully managed, models can be stood up in a matter of hours instead of weeks or months. The AFD service also makes many advanced features available including data validation, down-sampling for low-fraud rate scenarios (i.e. where fraud rate is <5%), feature enrichments (e.g. to extract additional data from IP addresses) and access to Amazon embeddings which enhance your model with knowledge Amazon has gathered about attempts to defraud their company. AFD also provides score normalization and calibration and model variable importance metrics. This service is made available through the AFD interface which simplifies the process of building, training, evaluating, and deploying models without requiring a team of dedicated ML engineers on staff.

The models built with AFD are available via secure API – this allows you to send data directly to the model and receive a prediction in real time. The model responds with a calibrated score between 0 – 1000 that is representative of the risk of fraud.

In short, AFD provides many features out-of-the-box that would otherwise be costly and time-consuming for organizations to develop.

Bringing the Model to Life – Rapid Application Development with Mendix

Even with a highly performant fraud model, there are still a couple of glaring holes in the plot. How is data collected and sent to the model? And, on the other side, how do we handle the case and investigation into likely fraud?

These are important decisions that need to be made when building fraud detection into an ecosystem. As novel forms of fraud are identified, we want to build workflows to support the investigation and, ultimately, rectification of these scenarios. Newly emerging fraud vectors are inherently difficult to predict, so flexibility in the fraud detection model and the management system are important.

An organization could purchase and customize an off-the-shelf solution, but this could become costly and difficult to upgrade over time. It may also limit the ability to build optimal workflows that help keep the organization competitive and responsive to market changes. As the customizations grow, upgrading the underlying software becomes even more difficult.

This is where rapid application development with Mendix shines – Mendix facilitates building and updating applications in a rapid manner with smaller development teams. The iterative nature of rapid application development is ideal for optimizing fraud response workflows. Additionally, Mendix provides several AWS connectors that make it simple to integrate an AFD model into application logic.

Creating our AFD Model

For our demo application, we’ll create two fraud detection services: one for the user registration process and a second for financial transactions. We will leverage open source data sets to train these models, which can be found at the bottom of the article. These datasets are built to leverage the OFI model – that is, a model that treats each event as independent. We’ll quickly run through the process to create new fraud models with AFD:

The first step in creating a new model is selection of a business case. In our demo application, we have two models – the first with an new account fraud business case and the second leveraging the online transaction fraud business case. There are additional business cases available including product review fraud and, importantly, a custom fraud model for more novel use cases. These business cases provide a list of required and recommended variables that help optimize your model with information known about these common types of fraud.

The next step is creating and defining your event types. We’ll consider registration and transaction events for our respective models. We then connect the variables (i.e. columns from the data sets) that are associated with each event. This can be automated by uploading the data set into an S3 bucket and then connecting the model to that location.

Lastly, we create our model by selecting the model type (Online Fraud Insights, for our models), our event types (registration and transaction) and the location of our historical data (our S3 buckets). We determine the fraud / non-fraud label (e.g. “fraud” and “legit”) that categorizes our historical data and click to begin the training of the model.

And with that, our fraud detection model is nearly complete! All that’s left is for us to review its performance and, if satisfied, deploy our model for use.

Analyzing the Model Performance, Identifying Thresholds

Once the training is finished, AFD will output statistics that will help you analyze the performance of your model. We’ll go through a portion of the output from the model we trained on the transactions dataset.

Here we have two visuals that provide us insight into our model’s accuracy at a high-level. This interactive display allows us to view performance metrics for different threshold scores. In the above image, we can view the confusion matrix if we were to choose a threshold score of 500. We can see that we have 13.6% false positive rate and 99.8% true positive rate. For every 100,000 transactions, this threshold would only result in 9 false negatives, but we would have nearly 13,000 false positives. Since this would result in many rejected transactions that should not have been, we’d likely consider a higher threshold.

Luckily, AFD also makes this information available in a table view which helps us identify appropriate threshold scores to use in our resulting workflows.

Based on this table, we can see that if we choose a threshold score of 945 then we would expect to capture nearly 100% of the fraudulent activity while having an approximately 1% false positive rate. Setting a threshold this high, however, can lead to ambiguous cases of fraud going undetected.

This table is helpful for ideating workflows in the source application. For example, we may want to automatically reject all transactions with a score greater than 945 while we want to create a review process for transactions with a score greater than 855 and flag transactions with a score of 770 for automated analysis.

Viewing and Validating Model Variables

The AFD service also provides insight into the relative importance of each feature in the resulting fraud likeliness score.

Here, we can see that, based on our sample data set, the old balance and new balance for the originating account are the leading factors in determining whether a transaction is fraudulent or not.

This is important for developing a logical understanding of the model and making sure it passes the “sanity check”. If certain features are heavily weighted, but do not make contextual sense, this is a good time to validate that the data was accurately collected. Features that have no predictive power (like “name_orig”, in the example) can also be removed as they add noise in the training process with no tangible value.

Viewing our App and Model in Action

With a suitable fraud detection model, it’s now time to make this service accessible and valuable to the organization. For our example, we’ve built an application using Mendix and AWS connectors from the Mendix app store which allow us to easily tap into our AFD services.

At user registration, we capture the details entered by the user as well as additional information (e.g. IP Address, Browser & Version) and pass this along to our first fraud detection service. As discussed above, this service will respond with a score between 0 – 1000. This allows us to modify our workflows logically based on these results. For example, we may want to block all user registrations with a score greater than 950 and only provide limited access, pending administrator review, to users that register and have a score greater than 800. This is all easy to accomplish with our custom-built application. We can build, and update, our workflows as needed to fit our business objectives:

Our second implementation is when transactions occur in the application. When a transaction is initiated, we pass the source account, destination account and transaction details of this along to our fraud detection service. Here, we’ve implemented a logical rule to block all transactions where the score is over 800 while allowing transactions where the score is lower. We keep track of all these transactions, their scores, and whether the transaction was blocked in an easily searchable administrator table for analysis.

This application can be easily extended to support investigation into these transactions to ensure we catch and address fraud scenarios.

Wrapping Up / Final Thoughts

The Amazon Fraud Detector service provides a powerful out-of-the-box, managed service that allows you to quickly stand up highly performant fraud models. All that is required is a clean set of historical data, the rest is handled by the AWS service. This can help organizations create customized fraud models that leverage machine learning while avoiding costly investments into specialist teams. This can also make machine learning-enabled fraud detection in reach of smaller organizations and business lines that aren’t able to deeply invest into these specialized skills.

The Mendix platform excels in building custom applications rapidly and includes connectors to AWS services that simplifies integration with AFD. This allows you to leverage the responses from fraud models easily within your application, making it a powerful addition to the managed fraud models in AWS. As fraud patterns change, both the model and your application can be responsive to new needs.

Further Extensibility

In this example, we leveraged the OFI model which treats transactions independently. The AFD service also provides the TFI model that allows you to define entities and create profiles around them – this allows you to add more complexity to your model as historical actions can be captured and correlated together.

The Amazon Fraud Detector service is a wonderful entry point into catching fraud and anomalies, and this article is just a cursory view into its available features. This is a good place to start and then iterate. As more complex and involved use cases arise, AWS provides services like SageMaker that allow you to manage the entire lifecycle of your custom ML model. These models are also easy to integrate into Mendix applications due to pre-existing Amazon connector modules.

Mendix’ core strength is the flexibility to rapidly iterate our solution with the integrated full-stack IDE. This empowers business-oriented developers and encourages cross-functional development teams that incorporate business and technical personas. This vastly improves project success when dealing with emerging technologies and novel use cases, which we are highlighting in our RAD AI Ignite series.


Be sure to check out additional installments of our RAD AI – Ignite series here:

RAD AI – Ignite | Episode 1: Unlock Your Organization’s Knowledge

Your organization sits on a wealth of knowledge and experience. This information is often locked away inside disparate software and file systems which can take significant effort to parse through. How do you unlock this knowledge and provide it to employees at critical points in their workflow?

Read More

RAD AI – Ignite | Episode 3: Extract and Organize Data

Forms, handwritten notes, and documents are important components in many of today's processes. Digitizing this information is oftentimes inefficient and costly. This leads to informational delays and reduced funding available for investment. How can we leverage AI/ML services to improve form-based processes and the timeliness of our data?

Read More