Stats & AI/ML Intern @ Pfizer | M.S. Data Science @ NYU

Building LLM-based and agentic AI systems.

Final-year M.S. Data Science student at NYU with industry experience building LLM-based and agentic AI systems. My research spans ML, transformers, and multimodal models, with interests in retrieval, interpretability and tokenization.

LLM systems, applied ML, multimodal researchNew York, NYdm6262@nyu.edu
Portrait of Deepanshu Mody

Building LLM agentic workflows and retrieval systems.

FocusAgentic LLM systems
DomainsHealthcare + enterprise data
Agentic AIRAG systemsKnowledge graphsMultimodal MLModel evaluationTokenization
Core focusLLMs, applied ML, multimodal AI research
ToolingPython, PyTorch, Neo4j, LangGraph, LangChain
CollaborationOpen to research, internships, and applied ML work

Research

Academic and capstone work spanning tokenization, imaging, and GNNs.

Sep 2025 - Present

Capstone Project (Advisor: Dr. Chris Tanner)

Kensho | Remote
  • Designed and implemented Markov Chain Monte Carlo and Reinforcement Learning approaches for globally optimizing BPE tokenization (entropy + compression objectives) training on the MiniPile corpus.
Apr 2025 - Jun 2025

Graduate Research Assistant (Advisor: Dr. Yiqiu Shen)

NYU Grossman School of Medicine | New York, NY
  • Curated a longitudinal imaging cohort of ~2k abdominal CT scans from ~80k patients with acute pancreatitis, linked to 3-year follow-up data; built a DICOM-to-NIfTI pipeline with automated PHI stripping.
  • Prototyped a deep-survival model on 80k MRIs that fuses 3-D ResNet radiomics with clinical labs (CRP, BUN) in a Cox/DeepSurv head to predict time-to-chronic progression; pilot C-index 0.81 vs 0.72 for Cox on 5-fold patient splits.
Jun 2022 - Dec 2022

Research Intern (Advisor: Dr. Daisuke Kihara)

Purdue University | West Lafayette, IN
  • Developed two GNNs (GCN, GNN-DTI) for RNA metal-ion binding, gaining +6.2 pp ROC-AUC over a CNN on 6.4k PDB structures.
  • Built a GPU-accelerated PyG stack on SLURM and a DGL graph-builder that cut preprocessing 5x and streamed 1.1M edges/s, enabling 128-config sweeps overnight.

Experience

Industry roles focused on applied ML, retrieval systems, and systems work.

Jun 2025 - Aug 2025

Statistics & AI/ML Intern

Pfizer | Boston, MA
  • Built a LangGraph multi-agent workflow (Gemini Flash, DeepSeek-R1) with end-to-end reasoning and tool use, testing latency, usage, and system reliability.
  • Extended and productionized the system with a Neo4j knowledge graph for drug-target-indication analytics, modeling evidence-weighted relationships (PageRank, community clustering).
Jul 2023 - Jul 2024

Data Scientist | Software Engineer - Data & AI

Incedo Inc. | Gurugram, India
  • Built LangChain RAG document QA over 1,200 technical manuals; 92.0 token-level F1 on a held-out test set; p50 <200 ms E2E on a single NVIDIA L4 (24 GB) with INT8 inference; hybrid BM25 + vector + cross-encoder rerank.
  • Raised Exact Match from 41% to 84% (+43 pp) on a 500-question benchmark via LoRA fine-tuning, RAFT, and hybrid retrieval with an embedding-based reranker.
Jan 2023 - Jun 2023

Software Engineering Intern

Kinara AI | Hyderabad, India
  • Prototyped a RISC-V vector extension and LLVM backend, implementing scatter/gather intrinsics to deliver 1.7x GEMM throughput and -34% ResNet-50 latency in cycle-accurate sims.

Publications & talks

Selected publications and workshop presentations.

2023

Auto Encoders for Communication-Efficient Distributed Learning

AAAI Deployable AI Workshop

Proposed a novel method using autoencoders to optimize distributed learning and presented at the workshop.

2025

Validity of Machine Learning-Based COVID-19 Prediction

PLOS ONE

Benchmarked 7 hematology-based prognostic models on 195k patient records across Brazil, Italy, and Western Europe; uncovered ~20% AUROC drop in cross-continental transfer and released an open-source validation toolkit.

Education

Academic training in data science, computer science, and biology.

Aug 2024 - May 2026

New York University

M.S., Data Science | New York, NY
  • 2nd position - AI4Purpose Hackathon
  • Research Mentor - Roaring Cubs Collective
Aug 2018 - Jun 2023

Birla Institute of Technology and Science, Pilani

B.E., Computer Science; M.S., Biological Sciences (Integrated) | Pilani, India

Skills

Technical stack spanning ML, systems, and deployment.

PythonSQLPyTorchNeo4jLangChainLangGraphC++CJavaKerasDjangoFlaskAWSAzureGitLaTeX

Contact

Let's discuss research, collaboration, or ML engineering roles.

Research interests

  • LLM systems, agentic workflows, and evaluation
  • Retrieval, knowledge graphs, and data products
  • Multimodal ML in healthcare and interpretability