Hi! I'm

SanchitGoel

Photograph of Sanchit Goel

I am currently a Masters Student in Data Science at UCSD. Prior to this, I worked as a data scientist at Fractal Analytics for over 3 years, where I developed intelligent solutions for Fortune 100 clients such as Merck, Cigna, and United Health. My educational background includes a major in Economics and a minor in Computer Science from Ashoka University, India.

Research Interests

My research interests revolve around the application of NLP and AI/ML in the fields of Healthcare and Cognitive Science.

Experience

Data Scientist, Fractal.ai

2020-23

  • Collaborated with Merck & co. to develop Content Tagging Automation Platform, utilizing ML models to enrich metadata tagging of healthcare documents. Reduced manual tagging by 90% for US and European markets.
    • Worked closely with Merck’s market leaders for adoption. Onboarded 12 European markets.
    • Led BYO model enhancement for replacing AWS Comprehend with BERT models. Finalized Roberta, reducing training and inferencing costs by over 30% and improving f1 score by 14%.
    • Trained and deployed production models for US, Portugal and Belgium, achieving an avg f1 score of 0.85.
    • Engineered a robust deployment and inference pipeline powered by AWS ECR, SageMaker and Eventbridge.
  • Developed Propay.bot, an NLP solution, utilizing POS tagging and NERs to process health insurance claims (PA and FWA contracts). Solution was adopted for further development by Cigna.
  • Engineered an intelligent E2E invoice processing solution for the admin team. Trained using XGboost on over 800 invoices and orchestrated through WorkFusion. Resulted in monthly savings of $3000 in labor costs.
  • Implemented and presented a POC for Heineken, a chatbot which enabled upper management to query their BI dashboard using natural language. Used Azure Bot Service and Power Bi Embedded

Research

Words matter: Gender, Jobs and Applicant Behaviour - Research Asistant

IZA 2021

Collaborated with Dr. Kanika Mahajan to investigate how implicit gender preferences in job ads impact labor market outcomes. Employed NLP techniques such as tf-idf, topic modeling, and word clouds to analyze disparities between gender-preferred roles.

PDF


Decentralization and Program Implementation - Research Asistant

Assisted Dr. Ashwini Deshpande in studying the effects of district splits on program effectiveness. Compiled a dataset encompassing 800 districts and 7000 blocks through web scraping and census data. Analyzed district splits in India between 2011-20 and examined their influence on NREGA implementation.


Covid 19 and Supply chain disruption - Research Asistant

AAEA 2021

Collaborated with Dr. Kanika Mahajan to assess the impact of Covid-19 on supply chain disruptions. Responsible for data cleaning and computing distances between production zones and retail centers using Google Maps API.

PDF


Auto Mobile Slowdown: A Qunatitative Study - Indpendent Study Module

Investigated the factors contributing to the automobile slowdown in 2019, including regulation changes, the NBFC crisis, rise in ride-hailing services, and consumer confidence. Utilized OLS estimation to quantify the impact of these factors.

PDF


Projects

Twitterazzi

POS tagging, NER (Spacy), Sentiment Analysis

Developed a web app for influencer Twitter activity analysis, employing POS tagging, NER, sentiment analysis, and wordcloud generation.

Book Shelf Digitization

Edge Detection, Hough Transformation, Google Vision API

Designed a web app that recognizes books on a bookshelf image. Segmented images using Canny edge detection and Hough transformation to identify book spines, followed by digitization and text extraction to identify books.

Facial Image Deobfuscation

Convolutional Autoencoders, Image Restoration

Trained Convolutional Autoencoders on 13,000 images to rectify three types of obfuscations (blur, pixelation, and speckle noise) in facial images.

Synthetic Population generation

Data Synthesis, Population Modeling, IPF

Collaborated with Dr. Debayan Gupta to study Covid-19 spread. Employed DataSynthesizer model using NSS data to upscale the dataset 15 times, generating around 15 million observations. Presented an Iterative Proportional Fitting (IPF) approach for integrating census datasets to produce Synthetic Population.

Hand Gesture Recognition

CNN, Image Classification,

Trained a Convolutional Neural Network (CNN) to identify hand gestures. Achieved a ∼98% classification accuracy across 11 gesture classes using the VPLU dataset of 1100 images.

Stocksnapshot

Web Scraping, UI Development, Cloud Deployment

Developed a stock analysis web app with over 20 years of financial data. The app used to serve over 100 monthly active users and includes a built-in DCF calculator for quick intrinsic value calculations. Hosted on Heroku.