Customer- Management System

Complaint- management

If you are sitting on a company’s service desk, you might know how hectic it is to identify the issue and route it to the corret team. Even the customer doesn’t get the correct information on which team is woring in it. In this article we will discuss a model where the customer complains about the issue they are facing with the bank.


  1. Finding the important statistical analysis by exploring the data by performing EDA.
  2. Building the ML model to predict the complaint & the department which should be handling it.
  3. To build a Web Application for registration of complaints by the uses.
  4. Database to which all the data is stored.

Importing the necssary libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import warnings

Importing the dataset:

df1 = pd.read_csv(‘complaints.csv’)

The dataset I have used contains of 13 coulums .

['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue',
'Consumer complaint narrative', 'Company public response',
'Submitted via', 'Date sent to company', 'Company response to consumer',
'Timely response?', 'Consumer disputed?', 'Complaint ID'],

We will check the missing values:

Date received                        0
Product 0
Sub-product 235160
Issue 0
Sub-issue 477597
Consumer complaint narrative 704013
Company public response 646002
Submitted via 0
Date sent to company 0
Company response to consumer 0
Timely response? 0
Consumer disputed? 135408
Complaint ID 0

So we can see the columns which have null values in it.

We have also renamed the two culumns Timely response? and Cosnsumer Disputed?

df2 = df1.rename(columns={‘Timely response?’:’Timely response’, ‘Consumer disputed?’:’Consumer disputed’})

After exploring the dataset, lets move to EDA and gather some insights from our data.

The various insigths gathered are:

a. Most favoured way to file a complaint is through ‘Web’ portal.
b. ‘Email’ is the least preferred way to register a complaint.

Company gave timely response to 97.28 % users.
Company was unable to provide timely response to 2.72 % users.

We have alo converted the column with date in date-time format and separed year and month as two differnt columns

Complains regitstered according to year were:


2017 207778

2016 191505

2015 168520

2014 153053

2013 108218

2012 72373

2011 2536

  1. The analysis shows that Complaints increased with every year.
  2. In Mathematical sense, there is a direct relationship between Complaints registered vs Year.

We have merged the products to reduce the number odf Departments:

df2[‘product’][df2[‘product’] == ‘Money transfer, virtual currency, or money service’] = ‘Money transfers’
df2[‘product’][df2[‘product’] == ‘Prepaid card’] = ‘Credit card or prepaid card’
df2[‘product’][df2[‘product’] == ‘Credit card’] = ‘Credit card or prepaid card’
df2[‘product’][df2[‘product’] == ‘Virtual currency’] = ‘Other financial service’
df2[‘product’][df2[‘product’] == ‘Payday loan’] = ‘Payday loan, title loan, or personal loan’
df2[‘product’][df2[‘product’] == ‘Credit reporting’] = ‘Credit reporting, credit repair services, or other personal consumer reports’

Also merge the columns related to complaints into a single columns:

df2[‘updated_complaint_narrative’] = df2[[‘issue’,’sub-issue’,’consumer complaint narrative’]].apply(lambda x:’’.join(x.astype(str)), axis=1)

After that remove the unwanted columns in the dataset.

Now to build our model we have used just two culumns because our main aim is to direct the complain to the correcct department.

Columns used: product and updated_complaint_narrative

Imported the required libraries for ML building:

from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import confusion_matrix

# creating the pipeline.
model_pipeline = Pipeline([
(‘vect’, CountVectorizer()),
(‘tfidf’, TfidfTransformer()),
(‘nb’, MultinomialNB())

The score of the Training data is:- 94.0242040753114
The score of the Testing data is:- 93.84073552207106
# splitting the data into 80% & 20%.X_train, X_test, y_train, y_test = train_test_split(df3['updated_complaint_narrative'], df3['product'], test_size=0.8, random_state = 0)print('The size of Input Training data:-', X_train.shape)print('The size of Output Training data:-', y_train.shape)print('The size of Input Testing data:-', X_test.shape)print('The size of Output Testing data:-', y_test.shape)The size of Input Training data:- (180796,)
The size of Output Training data:- (180796,)
The size of Input Testing data:- (723187,)
The size of Output Testing data:- (723187,), y_train)


Pipeline(steps=[('vect', CountVectorizer()), ('tfidf', TfidfTransformer()),
('nb', MultinomialNB())])

In [47]:

training_score = model_pipeline.score(X_train, y_train)testing_score = model_pipeline.score(X_test, y_test)print('The score of the Training data is:-', (training_score)*100)print('The score of the Testing data is:-', (testing_score)*100)The score of the Training data is:- 94.0242040753114
The score of the Testing data is:- 93.84073552207106
new_complaint = ["This company refuses to provide me verification and validation of debt per my right under the FDCPA. I do not believe this debt is mine."]print(model_pipeline.predict(new_complaint))['Debt collection']# save the modelimport picklepickle.dump(model_pipeline, open("complaint_nb_model.pkl", 'wb'))

Finally the model is saved as a pickle file.

Now we will proceed to the build the web application. Here I have used flask and deployed in Heroku.

The main file is . Here the model is loaded and the web application and the data base is connected . To get the complete code and description of this please go through

and the app is deployed on:

Author: Pragya Sinha




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Beginner’s Guide to LDA Topic Modelling with R

Monitoring changes in metrics

Weekly Digest for Data Science and AI: Python and R (Volume 5)

Understanding Probability And Statistics: Chi-Squared, Student-T, And F Distributions

Insights from Industry Leaders on Fraud Detection System

Supercharge Loan, Insurance underwriting processes with automation

Robust Measure of Location and Simulation Study

My Most Trending Articles

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pragya Sinha

Pragya Sinha

More from Medium

Data Preparation for Analytics

My First Business Analytics Project - Hotel Booking Cancelation Analyze Part 1

What the data looks like when we import

Implications of Engagement Strategy with Data and Analytics

Building Good Dashboards in Tableau.