Protect or Innovate? Cutting Through the Noise When Evaluating Predictive Models

In this Quick Read, originally published in Insurance CIO Outlook, Tom Fletcher, PartnerRe’s Global Head of Data Science Consulting, discusses the needs for valid, relevant, consistent and fair data – the key to successful model evaluation.
Many years ago, my boss said, referring to evaluating a vendor’s model: “It has to be the right fit for our company; but remember, although they may build the model differently than you would, that doesn’t make it wrong.”

When it came to evaluating predictive models, he believed in striving for balance between protecting the company and keeping an open mind to the potentially innovative ways in which these models could benefit the business.

As traditional sources of data for life insurance underwriting give way for additional data sources that are now being leveraged to accelerate the underwriting process, my ex-boss’s advice seems more relevant – and maybe also more challenging – than ever. When evaluating models, the current challenge for insurer lies in cutting through all the noise to find what really works for their business. In other words, is the data provided by the model valid, relevant, consistent and fair?

Is the model valid?

The overriding priority is validity. Generally, a model is designed for a specific purpose and there should be solid empirical evidence that the model fulfills that purpose and nothing else. This also entails evaluating the compatibility of the model with the other elements of the process and the data and/or models already being used. The ideal model will offer incremental validity above what is already there. In fact, a model’s usefulness in terms of added value matters more than its empirical strength. Strong models may go unused because there was no specific benefit to be gained from them. Conversely, moderately strong models may be implemented because of the utility they bring.

Is the data relevant and consistent?

Valid models come from relevant and consistent data. Imagine tracking some phenomenon using varying parameters (e.g., imperial vs. metric). The same information could have different meanings from one day to the next. That is why it is crucial to know the lineage and reliability of the data when evaluating whether some new source (i.e., data, model or tool) adds value.

The more you know about contextual factors such as poverty, family history, access to healthcare and so forth, the better. These background criteria lead to certain lifestyle characteristics associated with specific behaviors (e.g., exercise, eating habits) that in turn impact the body (e.g., BMI, cholesterol and, blood sugar levels). It might be a long chain of events, but it’s imperative to work very carefully through the logic to show relevance. It’s easy to claim that a correlation is valid, but careful consideration needs to be given to whether that correlation is driven by other factors, as the use of the data may have to be defended to a regulator.

Are there any fairness issues?

Consistent, relevant and valid data must also be fair: it’s key to ascertain the extent to which the model may introduce unfair discrimination. While historically, insurance companies have not collected protected class information, this is an emerging regulatory requirement.

When evaluating the possibility that a model could cause discrimination, insurers need to ask some critical questions:

Does the model work differently for some protected classes?
Does the model contain data that masks certain classes without being linked to the outcome of interest? (For example, is it related more to race than to the target?)
Will use of the model yield disparate outcomes that are not justified by the underlying risk?

Models that inadvertently introduce unfair discrimination into the underwriting process or that are perceived to be unfair can open a “Pandora’s box” of legal and regulatory issues.

Conclusion

When applied in the right way, predictive modeling can be invaluable in establishing actuarially sound principles and accelerating the underwriting process, while simultaneously adding to the volume of information going into the risk evaluation.

The key to success lies in making sure the data can lead to reliable conclusions that demonstrably add value to current business processes.

Contributor

Tom Fletcher, SVP, Global Head of Data Science Consulting, Life & Health

Opinions expressed are solely those of the author. This article is for general information, education and discussion purposes only. It does not constitute legal or professional advice and does not necessarily reflect, in whole or in part, any corporate position, opinion or view of PartnerRe or its affiliates.

Get in touch