The Data Science Algorithms in Insurance

We are breaking down how we think about building and evaluating algorithms, and highlighting ways that our models deviate to insure people more fairly. 

By: Carey Anne Nadeau, Co-CEO & Xunge Jiang, Head of Data Science

At Loop we develop predictive models that anticipate the probability of a driver being involved in a car crash, and use these insights to provide fairly priced insurance that doesn’t rely on demographics or credit. This article breaks down how we think about building and evaluating algorithms, and points out ways that our models deviate to insure people more fairly. 

When evaluating the risk of insuring a new driver, insurers estimate the probability that the driver will file a claim. The predicted outcome of a claim is binary – you either file a claim or you don’t. Claims from car crashes, especially severe and fatal crashes, are the worst outcome and most valuable to predict. Over time, insurers will have gathered enough data to train models that predict this outcome for all drivers. 

When evaluating a classification model, a confusion matrix is used to quantify how the model performs – in terms of minimizing false negatives – how many people filed a claim that we didn’t expect to -- and false positives – how many people did we think would file a claim, but didn’t. Insurers optimize for false negatives, to minimize the cost of claims related to error. 

Most insurers, being risk averse, when selecting risk, prefer people that have a low probability of a claim -- those for whom the predicted probability of a crash is closest to zero. They may forego drivers who score in the inner standard deviations, where the predicted probability of a claim vs. no claim, is closer to a coin flip. On the other hand, if it’s Root Insurance or other insurers willing to insure drivers that are likely to file claims, they select risk with a predicted probability that is close to one, hoping that they can decipher false positives -- drivers who other carriers think are very likely to crash but might not. 

From this, it’s reasonable to deduce that there is a portion of the population that carriers over-price because of their uncertainty in the predicted probability of the outcome (claim or no claims). Insurers who can spot the drivers that are least likely to be involved in a crash and sell, have profitable advantages. 

Loop is using data to measure risk more accurately and monitor it more regularly, rather than relying on historical proxy criteria, like a driver’s credit score. With these data, Loop will become increasingly certain, faster, about who is likely to file a claim and who is not.

People for whom the existing insurance model doesn’t work well today stand to benefit the most. For example, people whose credit has gotten worse in the pandemic, who rent their home,  live in historically red-lined neighborhoods, or are working from home.

At Loop, we’ll make insurance fairer for more people and believe that begins with our data science.