Try : Insurtech, Application Development










Dev Ops(2)

Enterprise Solution(18)




AI in Insurance(24)


Product Innovation(30)


Augmented Reality(7)

Customer Journey(7)


User Experience(21)

Artificial Intelligence(91)



Cognitive Computing(7)

Computer Vision(5)

Data Science(13)


Intelligent Automation(22)

Machine Learning(43)

Natural Language Processing(9)

How we used RetinaNet for dense shape detection in live imagery

Convolutional Neural Networks (CNN) have come a long way in conveniently identifying objects in images and videos. Networks like VGG19, ResNet, YOLO, SSD, R-CNN, DensepathNet, DualNet, Xception, Inception, PolyNet, MobileNet, and many more have evolved over time. Their range of applications lies in detecting space availability in a parking lot, satellite image analysis to track ships and agricultural output, radiology, people count, detecting words in vehicle license plates and storefronts, circuits/machinery fault analysis, medical diagnosis, etc.

Facebook AI Research (FAIR) has recently published RetinaNet architecture which uses Feature Pyramid Network (FPN) with ResNet. This architecture demonstrates higher accuracy in situations where speed is not really important. RetinaNet is built on top of FPN using ResNet.

Comparing tradeoff between speed and accuracy of different CNNs

Google offers benchmark comparison to calculate tradeoff between speed and accuracy of various networks using MS COCO dataset to train the models in TensorFlow. It gives us a benchmark to understand the best model that provides a balance between speed and accuracy. According to researchers, Faster R-CNN is more accurate, whereas R-FCN and FCN show better inference time (i.e. their speed is higher). Inception and ResNet are implementations of Faster R-CNN. MobileNet is an implementation of SSD.

Faster R-CNN implementations show an overall mAP (mean average precision) of around 30, which is highest for feature extraction. And, at the same time, its accuracy is also highest at around 80.5%. MobileNet R-FCN implementation has a lower mAP of around 15. Therefore, its accuracy drops down to about 71.5%. 

Thus, we can say — SSD implementations work best for detecting larger objects whereas, Faster R-CNN and R-FCN are better at detecting small objects.

speed and accuracy of various CNNs

On the COCO dataset, Faster R-CNN has average mAP for IoU (intersection-over-union) from 0.5 to 0.95 (mAP@[0.5, 0.95]) as 21.9% . R-FCN has mAP of 31.5% . SSD300 and SSD512 have mAPs of 23.2 and 26.8 respectively . YOLO-V2 is at 21.6% whereas YOLO-V3 is at 33% . FPN delivers 33.9% . RetinaNet stands highest at 40.8%.

RetinaNet- AP vs speed comparison
The two variations of RetinaNet are compared above for AP vs speed (ms) for inference.

One-stage detector vs two-stage detectors for shape detection

A One-stage detector scans for candidate objects sampled for around 100000 locations in the image that densely covers the spatial extent. This does not let the class balance between background and foreground. 

A Two-stage detector first narrows down the number of candidate objects on up to 2000 locations and separates them from the background in the first stage. It then classifies each candidate object in the second stage, thus managing the class balance. But, because of the smaller number of locations in the sample, many objects might escape detection. 

Faster R-CNN is an implementation of the two-stage detector. RetinaNet, an implementation of one stage detector addresses this class imbalance and efficiently detects all objects.

Focal Loss: a new loss function

This function focuses on training on hard negatives. It is defined as-

focal loss function


focal loss function

and p = sigmoid output score.

The greeks are hyperparameters.

When a sample classification is inappropriate and pₜ is small, it does not affect the loss. Gamma is a focusing parameter and adjusts the rate at which the easy samples are down-weighted. Samples get down-weighted when their classification is inappropriate and pₜ is close to 1. When gamma is 0, the focal loss is close to the cross-entropy loss. Upon increasing gamma, the effect of modulating factor also increases.

RetinaNet Backbone

The new loss function called Focal loss increases the accuracy significantly. Essentially it is a one-stage detector Feature Pyramid Network with Focal loss replacing the cross-entropy loss. 

Hard negative mining in a single shot detector and Faster R-CNN addresses the class imbalance by downsampling the dominant samples. On the contrary, RetinaNet addresses it by changing the weights in the loss function. The following diagram explains the architecture.

RetinaNet architecture

Here, deep feature extraction uses ResNet. Using FPN on top of ResNet further helps in constructing a multi-scale feature pyramid from a single resolution image. FPN is fast to compute and works efficiently on multiscale.


We used ResNet50-FPN pre-trained on MS COCO to identify humans in the photo. The threshold is set above a score of 0.5. The following images show the result with markings and confidence values.

Dense shape detection
Human shape detection

We further tried to detect other objects like chairs.

RetinaNet object detection

Conclusion: It’s great to know that training on the COCO dataset can detect objects from unknown scenes. The object detection in the scenes took 5-7 seconds. So far, we have put filters of human or chair in results. RetinaNet can detect all the identifiable objects in the scene.

Multiple objects detection using RetinaNet

The different objects detected with their score are listed below-


Next, we will be interested in working on a model good in detecting objects in the larger depth of the image, which the current ResNet50-FPN could not do.

About author: Harsh Vardhan is a Tech Lead in the Development Department of Mantra Labs. He is integral to AI-based development and deployment of projects at Mantra Labs.

General FAQs

What is RetinaNet?

RetinaNet is a type of CNN (Convolutional Neural Network) architecture published by Facebook AI Research also known as FAIR. It uses the Feature Pyramid Network (FPN) with ResNet. RetinaNet is widely used for detecting objects in live imagery (real-time monitoring systems). This architecture demonstrates a high-level of accuracy, but with a little compromise in speed. In the experiment we conducted, it took 5-7 seconds for object detection in live scenes.Dense shape detection - RetinaNet

What is RetinaNet Model?

RetinaNet model comprises of a backbone network and two task-specific sub-networks. The backbone network is a Feature Pyramid Network (FPN) built on ResNet. It is responsible for computing a convolution feature (object) from the input imagery. The two subnetworks are responsible for the classification and box regression, i.e. one subnet predicts the possibility of the object being present at a particular spatial location and the other subnetwork outputs the object location for the anchor box.

What is Focal Loss?

The focal loss function focuses on training on hard negatives. In other words, the focal loss function is an algorithm for improving Average Precision (AP) in single-stage object detectors. It is defined as-RetinaNet focal loss function

What is SSD Network?

Single Shot Detector (SSD) can detect multiple objects in an image in a single shot, hence the name. 
The beauty of SSD networks is that it predicts the boundaries itself and has no assigned region proposal network. SSD networks can predict the boundary boxes and classes from feature maps in just one pass by using small convolutional filters.

Glossary of Terms related to convolutional neural networks


Deep Learning uses Convolutional neural networks (CNN) for analyzing visual imagery. It consists of an input and output layer and multiple intermediate layers. In CNN programming, the input is called a tensor, which is usually an image or a video frame. It passes through the convolutional layer forming an abstract feature map identifying different shapes.


The process of combining region proposals with CNN is called as R-CNN. Region proposals are the smaller parts of the original image that have a probability of containing the desired shape/object. The R-CNN algorithm creates several region proposals and each of them goes to the CNN network for better dense shape detection.


Residual Neural Network (ResNet) utilizes skip connections to jump over some layers. Classical CNNs do not perform when the depth of the network increases beyond a certain threshold. Most of the ResNet models are implemented with double or triple layer skips with batch normalization in between. ResNet helps in the training of deeper networks.


You only look once (YOLO) is a real-time object detection system. It is faster than most other neural networks for detecting shapes and objects. Unlike other systems, it applies neural network functions to the entire image, optimizing the detection performance.


It is Facebook’s AI Research arm for understanding the nature of intelligence and creating intelligent machines. The main research areas at FAIR include Computer Vision, Conversational AI, Integrity, Natural Language Processing, Ranking and Recommendations, System Research, Theory, Speech & Audio, and Human & Machine Intelligence.


Feature Pyramid Network (FPN) is a feature extractor designed for achieving speed and accuracy in detecting objects or shapes. It generates multiple feature map layers with better quality information for object detection.

COCO Dataset

Common Objects in Context (COCO) is a large-scale dataset for detecting, segmenting, and captioning any object. 


Fully Convolutional Network (FCN) transforms the height and width of the intermediate layer (feature map) back to the original size so that predictions have a one-to-one correspondence with the input image. 


R-FCN corresponds to a region-based fully convolutional network. It is mainly used for feature detection. R-FCN comprises region-based feature maps that are independent of region proposals (ROI) and carry computation outside of ROIs. It is much simpler and about 20 times faster than R-CNN. 


It is an open-source software library developed by Google Brain for a range of dataflow and differential programming applications. It is also useful in neural network programming. 

Also read – How are Medical Images shared among Healthcare Enterprises


Knowledge thats worth delivered in your inbox

[Interview] Mr. Alex Jimenez | Digital Customer Experience in Covid-19 Times

7 minutes read

The COVID-19 pandemic has brought upon an unprecedented change in our daily lives and routines. Consumer behavior is changing constantly. Lockdowns and social distancing have led to huge losses for businesses across industries. The world is heading towards an economic slowdown. Under these circumstances, organizations are facing many challenges to keep their businesses going. Insurers too are facing similar issues. Some insurance lines such as motor, travel, home have suffered a business loss due to low demand.

To understand the impact of this crisis, especially in the USA, we interviewed Mr. Alex Jiminez, Strategy Officer at Extractable from California, and learned more about creating better digital customer experiences in these testing times. 

Extractable is a strategic consulting, design, and data analytics agency focused on the future of financial services. His other recent experience includes leading technology strategic planning for the office of the CIO, at Zions Bancorporation, and managing Digital Banking and Payments Strategy and Innovation at Rockland Trust. Alex has been named to several industry influencer lists in the areas of FinTech, RegTech, Blockchain, InsurTech, Innovation, and Digital Marketing. He has been featured in the Irish Tech News and the Independent Community Bankers of America’s (ICBA) Independent Banker.

Connect with Mr. Alex Jimenez – LinkedIn

The excerpt from the interview:

The impact of COVID-19 pandemic in the financial services industry

What is the impact of COVID-19 pandemic in the financial services industry, and how is the industry responding to the ongoing crisis in the US?

In the wake of the current crisis, organizations are more focused on keeping the operation going, trying to set-up work stations for remote working, dealing with customers and working with them over digital platforms. But very few are focusing on the future which is preparing for the after-effects of this pandemic on the economy. 

In-person communication is still an important mode of interaction with customers in the US banking sector. But now the issue is how to provide good services to clients? Some of our customers are going to experience digital models for the first time. 

Organizations that have well-defined Digital Strategies and Customer-First approach will be able to provide good support to their customers. Organizations that are late into this space are more likely to face problems in the future.

[Related: The Impact of Covid-19 on the Global Economy and Insurance]

Changing customer preferences

How can companies reach out to their customers in this New Normal world?

We have already started to move towards a digital-centric world which is just going to accelerate. We will see businesses who have earlier ignored their digital capabilities will now build more on them. 

The first video call was invented in the 60s and was not so appreciated as everybody thought it was expensive and complicated. Today we have FaceTime, Zoom but adoption has not happened on a larger scale. But this will soon accelerate. Customers will be comfortable dialing into a video chat with their Insurance agent. 

I don’t believe there’ll be a New Normal. For example, in the US after 9/11 people thought that life will never get back to normal but except for rigorous security screening at the airports, there hasn’t been much change in the behavior. 

In Israel, amidst all the constant disturbance, people in Tel Aviv and Jerusalem are living normal lives. There’ll certainly be some specific changes post the pandemic such as more adoption of digital technologies, more focus on customer needs but I believe there won’t be an entirely new world with a drastic change in consumer behavior.  

The need for personalization

What are some Attention hacking lessons for Insurers operating in ‘the New Normal’?

We are moving towards the personalization of products in general. Generally in Life Insurance, we insure people based on their date of birth or medical history. But what if we insure people based on their behavior? If we did that, would people change their more risky behavior to get a better rate? A non-smoker can be given a better rate as opposed to a smoker. If we get down to individuality, saying that this is your individual (your own) rate; it makes a difference. 

There is a lot of data available and AI is needed to mine that data and derive analytics. Just by building a relationship with customers, we are not doing a great job with personalization. It’s important to apply a human touch to the communication which makes customers feel like you know them. Thus, retaining their attention.

Digital customer experience in Insurance

For the insurance industry, what steps can help in delivering the right digital customer experience in terms of UX and visual design?

A lot of organizations practice Design Thinking but Financial Services don’t. They are of the opinion that they know what is needed as they themselves are customers and they have data from the surveys. But that’s a wrong approach. Design Thinking is about empathy. It is important to get into the shoes of your clients to design better solutions.

To enhance digital customer experience, Insurers need a thorough understanding of users — who are the ultimate clients, their needs, what they expect from this experience, etc. After comprehending how they engage with technology and financial services, start venturing into the solution and test the solutions with actual users.

Innovations in the financial services industry

What technology-based innovations are being explored within the financial services industry? And, do you see AI playing a role in the short term? 

AI has already affected Financial Services in a positive way and will make it better. In insurance, IoT has been very impactful and will continue to be. Some applications have already been applied in reality like sensors in cars to detect speed and ensure that you are under the speed limit. This helps in getting reduced premiums. 

However, some basic processes are still done in the old school way of shuffling papers. Straight though-out processes have not yet happened. Now RPA is being applied to this but it is more like a band-aid. What is more important is how we can build processes through true automation with AI.

[Related: 5 Insurance Front Office Operations AI Can Improve]

Adoption of AI in Insurance

Speaking about more adoption of technologies, do you think there’ll be more investment in AI now?

Absolutely! We have already seen that investment in technologies like AI, cloud computing, quantum computing has been ramping up. Businesses will invest much more in AI than before. It might be for better decision making, underwriting, understanding the behavior of clients, etc. Also, from a marketing standpoint, financial services have never focused much before but will now invest in AI for this area too.

[Related: How is AI extending customer support during COVID-19 pandemic]

In your recent article in Extractable – “Deploying third-party financial service technology to mitigate crisis” you talk about what tech vendors are doing wrong. Please expand on how to encourage resources to be innovative change agents?

There were two points that I made in the article-

First is about what companies are doing incorrectly when it comes to innovation. Risk management is consulted only after developing the product. The product release is stalled until the legal compliances are adhered to. Instead, companies should involve the risk management at the beginning of the process (while defining the problem and solution). Involving risk management at every step of the innovation process will make it much easier to push out innovation.

The second was about vendor management. Many small vendors such as tech vendors, InsurTechs want to sell solutions to financial service companies but are often surprised by the tedious vendor management process. There’s a lot of documentation. Once the first process of selling is done, vendors should package the documentation in a way that when the next prospect asks for it, the due diligence package is ready to offer. 

Read article – Deploying third-party financial service technology to mitigate crisis 

Wrapping up

Alex shared interesting insights on how Design Thinking and Visual Design can create better digital customer experience. The design vertical at Mantra Labs too believes in the same and has designed UX for various applications for its customers. Here’s an article to understand the role of Customer Experience (CX) and User Experience (UX): Creating Amazing Digital Customer Experiences

[Also read: [Interview] Mr. Andrew Warburton | The New Normal in Insurance]

AI is going to be essential for Insurers to gain that competitive edge in the post-pandemic world. Check out Hitee — an Insurance specific chatbot for driving customer engagement. For your specific requirements, please feel free to write to us at hello@mantralabsglobal.com. 


Knowledge thats worth delivered in your inbox

Loading More Posts ...
Go Top

May i help you?

bot shadow

Our Website is
Best Experienced on
Chrome & Safari

safari icon