Hello, It's me

Sai Prasanna Kumar

I'm a

A dedicated Data Scientist driven by curiosity and a deep understanding of analytics, specializing in developing scalable AI solutions using advanced technologies and cloud services. My expertise grows through hands-on projects and innovative problem-solving.

Sai's Avatar Image

skills

education

experience

Data Scientist II
Republic Services Inc.
  • Developed customer service chatbots using Retrieval-Augmented Generation (RAG) architecture with AWS infrastructure, integrating models like Claude, Mistral, and LLaMA via AWS Bedrock, and utilizing GraphRAG.
  • Deployed and monitored an end-to-end pipeline using AWS services like Knowledge bases, Lambda, Step Functions, DynamoDB, OpenSearch, Grafana, CloudWatch, and ECR, optimizing document management and data retrieval.
  • Built an AI-driven 'Next Best Actions' solution, boosting customer engagement by 30% and reducing response time by 25%, enhancing satisfaction and retention.
  • Cut latency in retrieving customer service answers by 50%, reducing call representative workload by 20% monthly.
  • Evaluated pipeline performance with RAGAS, enhancing model efficiency by 10%.
  • Scaled chatbot applications across various operational areas at Republic Services, influencing a $20M budget.
Senior Data Scientist
BobiHealth
  • Developed an end-to-end NLP-driven chatbot leveraging Hugging Face Transformers, Langchain, GPT-4, RAG, RLHF, ChromaDB, and FastAPI, boosting user engagement with 90% retrieval accuracy.
  • Designed an AI model for pregnancy risk analysis (90% accuracy) using Random Forest, and collected data for analysis from public health datasets. Engineered data pipelines and provided predictive analytics with Azure Databricks.
  • Orchestrated Airflow workflows for model training and deployment; optimized CI/CD pipelines and Python data modeling to AWS S3, increasing agility by 15%. Incorporated GDPR principles through FMEA ensuring compliance and data protection.
  • Utilized R and dplyr for advanced data analytics, analyzed pregnancy risk datasets, performed large-scale ad-hoc SQL queries, and created interactive visualizations using PowerBI, Tableau to extract KPI metrics.
Teaching Assistant
San Jose State Univeristy
  • Assisted in teaching a Machine Learning course to 40 undergraduate students, delivering hands-on coding labs and practical assignments to solidify core ML concepts.
  • Provided one-on-one support and clarified doubts, fostering a strong learning environment that contributed to a pass rate of over 95%.
  • Guided students in exploring ML and Data Science careers, nurturing their skills and interest in the field through real-world projects and problem-solving sessions.
Data Scientist
ZS Associates
  • Led redesign of customer engagement strategy, implementing sophisticated Machine Learning models like LSTM and XGBoost, increasing sales forecasting and driving a $20M revenue increase while improving lead qualification by 15%.
  • Developed a Writer lookalike model with 97% accuracy by combining Positive-Unlabeled learning, containerized in Docker for efficient scaling, and orchestrated using Kubernetes; extracted key insights using Shapley values (SHAP) for explainable AI.
  • Re-engineered Resource Optimization algorithm for call allocation, resulting in increased operational efficiency by 3x.
  • Conducted A/B testing to optimize marketing strategies, increasing conversion rates by 20%.
  • Utilized Hadoop, PySpark, and MapReduce for processing large datasets, significantly reducing data processing time by 30%.
  • Collaborated with cross-functional teams to merge ETL pipelines with Machine Learning solutions, leveraging Excel and Matplotlib for data visualization to communicate complex data, insights, and adherence to SLA with key stakeholders.
Data Scientist
Accenture
  • Developed a recommendation system employing Neural Collaborative Filtering, Autoencoders, and SVD + Neural Network, achieving optimal performance with an 18% drop in RMSE and MAE. Elevated cloud operations using AWS S3 and EC2.
  • Developed a scalable Big Data image search engine, leveraging Java, PySpark, and Kafka for data processing, integrated Elasticsearch with OpenAI's CLIP model via MLflow for efficient image retrieval, product analytics.
  • Deployed YOLOv4 on AWS with Redis, achieving 95% accuracy for real-time mobile usage detection and a 25% efficiency boost; utilized R to enhance bus battery efficiency by 15% and reduce costs by 10%.
Data Scientist
Freelancing
  • Led photorealistic face generation project using GANs (DcGAN, StyleGAN), enhancing diversity in synthetic image datasets.
  • Curated a dataset of 20,000 Indian facial images via web scraping, ensuring diverse feature representation.
  • Trained GAN models on high-end GPUs NVIDIA A100s on AWS EC2, achieving high-fidelity image generation.
Data Science Intern
JotArthur Web Services
  • Developed a predictive model for financial risk analysis using Random Forest and Logistic Regression, improving risk prediction accuracy by 15%.
  • Conducted data cleaning and preprocessing on large financial datasets, enhancing data quality for more accurate analysis.
  • Implemented a customer segmentation model using K-means clustering, enabling targeted marketing strategies and increasing customer engagement by 20%.

publications

Knowledge Graph Relation Extraction using LLMs

Knowledge Graph Relation Extraction using LLMs

Enhanced biomedical relation extraction capabilities using Flask, ChatGPT API, and Weaviate with a QLoRA-fine-tuned LLaMA-2 model. This system significantly advances the field by achieving an F1 score that exceeds state-of-the-art benchmarks by 21%. The results of this study have been submitted to the BigDataService 2024, the 10th IEEE International Conference on Big Data Computing Services.

#Flask

#ChatGPT API

#Weaviate

#QLoRA

#LLaMA-2

#Python

#Knowledge Graphs

#NLP

UC Berkeley AI Hackathon using LLMs

UC Berkeley AI Hackathon using LLMs

Developed "HireMeAI" at UC Berkeley AI Hackathon, a platform using LLMs like OpenAI API, Anthropic Claude 3 and React, Flask, MongoDB for real-time interview scheduling for Hiring Managers, resume building, and personalized feedback for candidates, demonstrating scalability and potential for expansion.

#React

#Flask

#MongoDB

#OpenAI API

#Antropic Claude 3

#LMNT

#Python

#AI

#LLMs

projects

Image Search Engine

Image Search Engine

This Project contains an end-to-end implementation of an image search engine application. It is a rough simulation of a real world implementation and contains various modules, which are Image Produce, Image Consumer, FastAPI server, React Web App.

#Python

#FastAPI

#Confluent Kafka

#ElasticSearch

#Pyspark Streaming

#Hugging Face Transformers

#ReactJS

#MLflow

#Docker

#Apache Kafka

#Apache Spark

#Kubernetes

#AWS

#GCP

#Azure

#S3

#Cloud Functions

#TensorFlow

#PyTorch

#Node.js

#Webpack

#Babel

#ESLint

#Jest

PDF Question Answering Bot

PDF Question Answering Bot

This repository contains a PDF Question-Answering chat application that extracts information from uploaded PDF files and answers user queries based on the document content. It uses LlamaIndex, ChromaDB, and OpenAI's GPT-4 to provide accurate answers to questions related to the uploaded documents.

#Python

#Flask

#PyMuPDF

#OpenAI GPT-4

#LlamaIndex

#ChromaDB

#NLTK

#Rouge-score

#OpenAI API

#HTML

#CSS

#JavaScript

Biomedical Relation Extraction Using LLMs and Knowledge Graphs

Biomedical Relation Extraction Using LLMs and Knowledge Graphs

This project enhances biomedical NLP by comparing the performance of established models like BioBERT against newer models such as Gemma-2b, Gemma-7b, and Llama2-7b, on benchmark datasets. It aims to improve binary relation classification and integrates findings into knowledge graphs to map complex relationships.

#Python

#BioBERT

#Gemma-2b

#Gemma-7b

#Llama2-7b

#Neo4j

#Hugging Face Transformers

#spaCy

#PyTorch

#TensorFlow

#GCP

#AWS

#Docker

#Kubernetes

certifications

Applied AI with Deep Learning

Applied AI with Deep Learning

Coursera

Deep Learning | Neural Networks | NLP | AI

Certificate Credential

Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

Coursera

Deep Learning | Neural Networks | NLP | AI

Certificate Credential

IBM Data Science Professional Certificate

IBM Data Science Professional Certificate

IBM

Data Science | Machine Learning | AI

Certificate Credential

Open Source Tools for Data Science

Open Source Tools for Data Science

IBM

R | Python | Data Science | Deep Learning

Certificate Credential

Associate Cloud Engineer

Associate Cloud Engineer

Google Cloud

Deep Learning | Neural Networks | NLP | AI

Certificate Credential