Database Optimization and Machine Learning Solutions for a Real Estate Data Aggregation and Analytics Company 

Edvantis and KPC Labs formed an IT partnership in 2010, and have since collaborated on a number of software development and big data analytics projects.

kpclabs logo
  • Service

    Staff Augmentation 

  • Industry

    Data Analytics

  • Location

    United States

13+
years of partnership
20+
microservices introduced
3-5
seconds to execute a query on a 250+ million record database

Challenge 

KPC Labs aggregates and analyzes data, helping businesses gain valuable insights. We formed an IT partnership with them in 2010 and have since collaborated on a number of software development and big data analytics projects.

The primary goal of our long-term cooperation with KPC Labs was to address diverse challenges in the real estate sector — from real estate data aggregation and augmentation services to developing a dialer service and incorporating machine learning (ML) solutions. Together, we needed to enhance the workflow for real estate agents and empower brokers to efficiently identify and secure properties with a high likelihood of quick sales.

“The members of their team adhere to a workflow with great discipline. They have been very transparent about what they can deliver and kept their promises. We keep up the successful partnership. Quality, speed, cost, and skills are well-balanced!”

Seth Krauss, Partner at KPC Labs 

Main Goals 

  1. Create a web crawling framework to easily extend and maintain the automated extraction of data from public websites
  2. Design, develop and maintain the lead management web portal
  3. Optimize database queries to efficiently handle large amounts of data
  4. Introduce a machine learning solution to predict sales in the US real estate market
  5. Migrate from a legacy system to AWS-based infrastructure
  6. Modularize the monolith system and componentize deployments
  7. Create data integration pipelines and provide data acquisition & enhancement services

Technologies Used 

Java 8, Spring Framework, JavaScript, jQuery, MySQL, Amazon Web Services 

Team size and Composition

We assembled a cross-functional team of 7 developers, 1 ML engineer, and 2 QA engineers who contributed to the software development, quality assurance, tech design, and machine learning/artificial intelligence research and development. During different project stages, we scaled the team up to 15 specialists matching the Client’s requirements.

Solution 

Throughout more than 13 years of cooperation, our team has successfully migrated all data and the portal website to AWS, implemented a new web-based application and enhanced the performance and stability of the data acquisition, crawling, and ingestion platforms.

We brought in practical technologies like Amazon Sagemaker and used methods such as gradient boosting and random forest for better house value predictions. We also added a language model (BERT) to analyze notes left in the dialer data.

Tasks and ML Models Applied

  1. Prediction likely-to-list (likely-to-sell) homes from off-market properties
    Gradient boosting machines (including XGBoost, LightGBM, AdaBoost), random forest, stacking ensemble, deep learning
  2. Competitive market analysis based on pricing models that forecast home value ranges and outliers
    Random forest and gradient boosting machines with controlling the predictions
  3. Similarity analysis to identify homes close to a targeted home based on significant classifier characteristics
    K-nearest neighbors (k-NN algorithm) in the vector space
  4. Predicting homes that are most likely to become leads based on contacting behavior
    Gradient boosting machines (including XGBoost, LightGBM, AdaBoost), random forest, stacking ensemble, deep learning
  5. Predicting homes that are most likely to be receptive to approach and the best times to make contact
    Gradient boosting machines (including XGBoost, LightGBM, AdaBoost), random forest, stacking ensemble, deep learning
  6. Automated natural language processing of agent supplied notes and text to infer lead disposition
    BERT language model fine-tuned as sequence classifier stacked together with TF-IDF-based logistic regression
  7. Insight curves that identified longitudinal trends and lead rate peaks in time
    Radial basis function (RBF) neural networks and Nadaraya-Watson Kernel Regression

During Our Partnership We:

  • Designed and developed low-burden code-free web crawling language and framework to automatically discover and extract relevant data from public websites
  • Designed and developed a lead management web application for searching extracted data for potential leads based on geo-targeted searches by lead type and maintain prospecting contact management workflow. Uses two very large databases in highly concurrent, real-time transactional usage patterns (serving 1000s of live and 100s of concurrent users)
  • Application of ongoing performance, stability, and security optimizations over the course of a decade for all platforms
  • Introduced 20+ microservices, refactoring the monolith to support more agile and higher velocity development and feature releases
  • Separated the monolith into a white-labeling framework supporting two different portals. Also developed a robust referral framework and flexible subscription models for over a dozen affiliates
  • Tightly integrated third-party dialing service and application workflow seamlessly into portals
  • Added Amazon Redshift as a data warehouse solution, which supported data statistic calculations
  • Migrated the portal website to Elastic Beanstalk inside a virtual private cloud (VPC), enabling auto scaling
  • Added Elasticsearch, resulting in a significant improvement in website and query performance

Results

With the help of the Edvantis engineers, KPC Labs gained workable IT services (web solutions and portals) for their operations. Currently, the team is developing an innovative Recommendations & Insights software module backed by industry-leading AI models.

Delivered Value in Numbers

  • 180+ sites crawled per day
  • Stable execution of complex database queries against hundreds of millions of rows in under 5 seconds for 99% of distribution with horizontal scaling
  • Approximately 250 million record updates are processed monthly, concurrently, without any impact on read-performance
  • Mature software engineering lifecycle with CI/CD that allow for patches to services at any point in time. Monthly minor releases and quarterly major releases.



Edvantis has been instrumental in most of our key deliverables for the past decade and are the only vendor that has established themselves as a strategic partner.

Drop a Line
About Your Project

Submit the form below or get in touch with us by email engagement@edvantis.com outlining your project details. You’ll get a response within one business day from an Edvantis expert skilled in your tech stack, industry, or specific business challenge.
It would be a pleasure to work with you.

    This is a required field
    This is a required field
    This is a required field
    This is a required field
    I’m interested in:

    Our Recognitions: