Hello! I'm Sanja, a full-stack Data Scientist with a background and experience in delivering
ML/AI end-to-end solutions in both large and small companies, both in research and in industry.
I'm especially interested in recommendation systems, NLP, and two-sided marketplaces.
I enjoy exploring new developments in LLMs and generative AI, while thinking critically about how to apply them in scalable, thoughtful, and safe ways.
Leading development of AI solutions for structured metadata generation and annotation and improved global matching in Indeed's two sided marketplace
Leading and delivering production-level AI solutions in the Metadata and Taxonomy team powering core Indeed features globally
Developed ML models, tooling, and workflows to improve and expand Indeed's occupation metadata on jobs and resumes in 10+ markets
Developed a prototype of an intelligent component for the new Fluid collaborative technology in the Office 365 suite
Deployed tooling for automatic triage of build and test failures for the Secure Systems Group, reducing triaging and debugging time to seconds
Delivered end-to-end Text Summarization pipeline combining extractive algorithms and abstractive Deep Learning models, used by strategy teams
Backend developer and data engineer in an early-stage start-up for street level threat assessment; implemented core app features and infrastructure (databases, APIs etc)
Smart Digital Government - developed and deployed a fully functional assistant chatbot using NLP techniques in a start-up environment
Teaching Assistant (TA) for a popular Introduction to Machine Learning course (6.036)
Human-aligned features in Deep Learning (Computer Vision) - supervised by Prof. Pulkit Agrawal
Development and improvement of perception and inference modules for Machine Commonsense Reasoning, visual intuitive physics.
Tree-structured code representation and malware classification - supervised by Prof. Una-May O'Reilly
Lab Assistant (LA) for a popular Introduction to Machine Learning course (6.036)
Accepted paper on Recsys 2025 conference. The paper introduces an automated framework using large language models and embedding-based similarity to normalize inconsistent job titles for improved autocomplete suggestions on job search platforms, resulting in higher accuracy and user engagement.
View Paper in PressHackMIT 2019 Grand prize winner and Track prize winner. Smart ML data labeling tool that considers consistency between labelers as well as consistency with their previous selves.
View ProjectAn interactive app where the user's motion activates real-time animations and sounds. It is meant to be used for creative and entertainment purposes.
View ProjectA web app that allows the citizens of Cambridge (MA) to stay up to date on local crimes and engage in community discussion on crimes in neighborhoods.
View ProjectResearch oriented project studying robustness against obfuscations and adversarial inputs of models of code in two different settings.
View Project