Hi! I'm

SanchitGoel

Photograph of Sanchit Goel

I am a Machine Learning Engineer with 4 years of experience in Generative AI, NLP, and Large-Scale ML Systems. Currently, I am completing my Master’s in Data Science at UC San Diego and actively seeking full-time opportunities where I can apply AI to solve impactful real-world problems.

Research Interests

Beyond industry work, I conduct AI-driven healthcare research at Dr. Lee’s lab (UCSD), where I develop multi-modal models to detect early signs of schizophrenia using speech-based biomarkers.

Experience

Machine Learning Engineer, Frida

2024-2025

  • Engineered multimodal RAG knowledge retrieval system leveraging GPT LLM, LangChain, and LLamaIndex to enhance internal knowledge management and enable agent-driven customer support.
    • Optimized text chunking using LangChain Recursive Splitter and Unstructured library, enabling efficient table and image extraction and achieving a 0.83 context recall as evaluated by the RAGAS framework.
    • Improved retrieval accuracy by 14% by implementing cross-encoder reranking and ReAct AI agents to dynamically route queries to relevant partitions within the Pinecone vector database.
Machine Learning Engineer, Fractal.ai

2020-23

  • Collaborated with Merck & co. to develop Content Tagging Automation Platform, utilizing ML models to enrich metadata tagging of healthcare documents.
    • Improved marketing engagement by 12% by developing an end-to-end ML pipeline on AWS SageMaker, optimizing business campaign outreach strategies.
    • Led a team of 3 to transition from AWS Comprehend to finetuned BERT models, achieving a 30% cost reduction and a 14% improvement in F1 score, validated through McNemar’s A/B test.
    • Developed CI/CD pipelines with Jenkins, Git, Docker, and AWS ECR for automated testing and deployment.
  • Engineered Propay.bot, an NLP-based fraud detection system, integrating POS tagging and NERs to flag problematic health insurance claims; adopted by UnitedHealth for further development.
  • Saved 650+ manual hours yearly and reduced turnaround time by 75% by developing an invoice entity extraction solution for Fractal’s admin using XGBoost, Selenium, and UiPath.

Research

Decoding loneliness: Can explainable AI help in understanding language differences in lonely older adults?

Psychiatry Res 2024

Developed multi-modal models on speech data, integrating acoustic and linguistic features to detect social isolation.

PDF


Words matter: Gender, Jobs and Applicant Behaviour - Research Asistant

IZA 2021

Collaborated with Dr. Kanika Mahajan to investigate how implicit gender preferences in job ads impact labor market outcomes. Employed NLP techniques such as tf-idf, topic modeling, and word clouds to analyze disparities between gender-preferred roles.

PDF


Decentralization and Program Implementation - Research Asistant

Assisted Dr. Ashwini Deshpande in studying the effects of district splits on program effectiveness. Compiled a dataset encompassing 800 districts and 7000 blocks through web scraping and census data. Analyzed district splits in India between 2011-20 and examined their influence on NREGA implementation.


Covid 19 and Supply chain disruption - Research Asistant

AAEA 2021

Collaborated with Dr. Kanika Mahajan to assess the impact of Covid-19 on supply chain disruptions. Responsible for data cleaning and computing distances between production zones and retail centers using Google Maps API.

PDF


Projects

Twitterazzi

POS tagging, NER (Spacy), Sentiment Analysis

Developed a web app for influencer Twitter activity analysis, employing POS tagging, NER, sentiment analysis, and wordcloud generation.

Book Shelf Digitization

Edge Detection, Hough Transformation, Google Vision API

Designed a web app that recognizes books on a bookshelf image. Segmented images using Canny edge detection and Hough transformation to identify book spines, followed by digitization and text extraction to identify books.

Facial Image Deobfuscation

Convolutional Autoencoders, Image Restoration

Trained Convolutional Autoencoders on 13,000 images to rectify three types of obfuscations (blur, pixelation, and speckle noise) in facial images.

Synthetic Population generation

Data Synthesis, Population Modeling, IPF

Collaborated with Dr. Debayan Gupta to study Covid-19 spread. Employed DataSynthesizer model using NSS data to upscale the dataset 15 times, generating around 15 million observations. Presented an Iterative Proportional Fitting (IPF) approach for integrating census datasets to produce Synthetic Population.

Hand Gesture Recognition

CNN, Image Classification,

Trained a Convolutional Neural Network (CNN) to identify hand gestures. Achieved a ∼98% classification accuracy across 11 gesture classes using the VPLU dataset of 1100 images.

Stocksnapshot

Web Scraping, UI Development, Cloud Deployment

Developed a stock analysis web app with over 20 years of financial data. The app used to serve over 100 monthly active users and includes a built-in DCF calculator for quick intrinsic value calculations. Hosted on Heroku.