(Preprint) Abhishek Divekar, Mudit Agarwal, Srujana Merugu, and Nikhil Rasiwasia. "Unsupervised text augmentation using Pre-trained Paraphrase Generation".
Curriculum vitae (CV)
Downloadable PDF: Coming soon!
Cover Letter
I work at Amazon International Machine Learning, where I research and train models for AutoML and NLP projects. I helped build and currently lead Science efforts for an AutoML platform, EPS. EPS has been used by internal teams to deploy 1,000+ models across 14 Amazon websites, driving over $550 million dollars in revenue for Amazon businesses like Subscribe and Save. My Science work has been published internally at Amazon Machine Learning Conference (AMLC) and adopted in external features like AutoGluon’s infer_limit
feature. Since early 2022, I have been interested in NLP modeling for a task-oriented chatbot which answers factual product-related questions on amazon.in.
Education
- B.Tech. in Information Technology, Veermata Jijabai Technological Institute, 2017
- Thesis: Machine Learning for Anomaly-based Network Intrusion Detection, supervised by Dr. Mahesh Shirole.
- GPA: 8.74 on 10 (Transcript)
- Coursework (“A” grades): Data Structures and Algorithms, Discrete Mathematics, Operating Systems, Data Mining, Embedded Systems.
- M.S. in Computer Science, The University of Texas at Austin (expected Dec 2023)
- GPA: 4.0 (Latest Transcript)
- Coursework (“A” grades): Deep Learning, Advanced Linear Algebra for Computing, Natural Language Processing, Machine Learning, Online Learning and Optimization, Case Studies in Machine Learning.
Work experience
- Dec 2021 - Present: Applied Scientist II, Amazon.
- ML modeling for a task-oriented chatbot that answers factual questions about products on amazon.in. Responsibilities: Intent detection, information retrieval, and knowledge-base curation models.
- Led product design, implementation and roadmap-planning for production launch of new AutoML platform (Litmus) aimed at accelerating AutoML experimentation for Scientists and Engineers (Talk at AIMLSys 2022).
- Designed and implemented APIs for Unified Task Framework, an extensible software framework which facilitates extreme interoperability and reuse at all levels: model artifacts, algorithm training/inference code, or entire ML pipelines.
- Developed Litmus Scalable Dataframe, an efficient drop-replacement for Pandas dataframe which using heterogeneous underlying data-structures (list of dicts, Pandas, Dask, cuDF, etc). Compared to native Pandas, LitSDF achieves 15.2x improvement in data-processing speed during realtime ML deployments and 27.7x during batch deployments.
- Oct 2020 - Nov 2021: Research Engineer II, Amazon.
- Used AmaBERT (BERT-Base pre-finetuned on Amazon product text), to classify products into 10,000+ browsable categories within the Amazon product taxonomy. Increased classification accuracy over existing multi-task FastText model by ~7% (62% to 69%).
- Co-authored internal papers:
- “Squeezing the last DRiP: AutoML for cost-constrained Product classification” (Poster at AMLC 2021) (Talk at Amazon Research Days 2021).
- “CPP Multimodal AutoML Corpus and Benchmark” (Oral at AMLC 2021 Multi-modal workshop).
- “LEAP: LEAf node Predictions in the wild” (2nd ASCS Applied Science Workshop).
- Oct 2019 - Sep 2020: Research Engineer, Amazon.
- Designed and implemented data-processing nodes for an internal AutoML platform, contributing 160K lines of Python code.
- Developed Docker containers to predict UNSPSC code for ~500MM products on Amazon.com. Developed Depth-first preprocessing which improved latency by 30-50%.
- Co-authored “Entity Prediction Service: a configurable, end-to-end AutoML system” (Poster at AMLC 2020 AutoML workshop).
- Aug 2017 - Sep 2019: Software Development Engineer, Amazon.
- Designed and implemented multi-device purchase flow used by all Kindle devices in Europe. Launched secure handoff from Kindle to another device using SMS/Email notifications, CSRF tokens and server-side caching, protecting Critical customer data.
- Developed new REST APIs and integration test framework for worldwide Tier-1 payments service, used by internal businesses including Kindle, Alexa and Amazon Prime to carry out customer payment flows. Parallelized calls to downstream services using Java’s ThreadPoolExecutor, reducing API latency by 25%.
- Summer 2017: Research Assistant, Veermata Jijabai Technological Institute
- Supervisor: Mahesh Shirole.
- Discovered vulnerabilities in ML models trained on popular network-intrusion KDDCUP99 dataset, and proposed hybrid dataset which mitigated these vulnerabilities.
- First author on full-length paper which appeared in IEEE ICCCS-2018 conference: https://arxiv.org/abs/1811.05372
- May - July 2016: Software Development Intern, Barclays
- Developed prototype to optimize Foreign Exchange transaction time from T+2 days to T+120 seconds (99.93% reduction). Used Ripple blockchain and Node.js.
Skills
- Programming Languages
- Proficient (100,000+ lines of code in production): Python, Java
- Familiar (used at work): SQL, C++, JavaScript/TypeScript
- Tools
- Data Science: PyTorch, Keras, Ray, Pandas, NumPy, Dask, Apache Spark, HuggingFace Transformers.
- Software Development: Git, Docker, Conda, Streamlit, Flask, JUnit, unittest.
- Academic: LaTeX, ReadCube Papers, ResearchRabbit.ai.
- Cloud: Amazon Web Services (Sagemaker, StepFunctions, Elastic Map Reduce, Batch, DynamoDB, EC2, S3).
Publications
Abhishek Divekar*, Gaurav Manchanda*, Prit Raj, Abhishek Das, Karan Tanwar, Akshay Jagatap, Vinayak Puranik, Jagannathan Srinivasan, Ramakrishna Nalam, and Nikhil Rasiwasia. "Squeezing the last DRiP: AutoML for cost-constrained product classification". 9th conference of Amazon Machine Learning (AMLC 2021) (Poster) (acceptance-rate: 30%) (Talk at Amazon Research Days 2021)
Andrew Borthwick, Abhishek Divekar, Nick Erickson, Fayaz Ahmed Farooque, Oleg Kim, Nikhil Rasiwasia, and Ethan Xu. "The CPP Multimodal AutoML Corpus and Benchmark". 1st AMLC Workshop on MultiModal Learning and Fusion at the 9th conference of Amazon Machine Learning (AMLC 2021) (Oral) (Internal venue)
Abhishek Divekar, Vinayak Puranik, Zhenyu Shi, Jinmiao Fu, and Nikhil Rasiwasia. "LEAP: LEAf node Predictions in the wild". 2nd ASCS Applied Science Workshop, 2021 (Oral) (Internal venue)
Gaurav Manchanda*, Abhishek Divekar*, Akshay Jagatap, Prit Raj, Vinayak Puranik, Nikhil Rasiwasia, Ramakrishna Nalam, and Jagannathan Srinivasa. "Entity Prediction Service: a configurable, end-to-end AutoML system". Workshop on Automated Machine Learning at the 8th conference of Amazon Machine Learning (AMLC 2020) (Poster) (Internal venue)
Abhishek Divekar, Meet Parekh, Vaibhav Savla, Rudra Mishra, and Mahesh Shirole. "Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives". IEEE International Conference on Computing, Communication and Security (ICCCS 2018) (Oral) (https://arxiv.org/abs/1811.05372)
Talks and presentations
Flexible AutoML: Accelerating AutoML adoption across Amazon
Talk at 2nd International Conference on AI-ML Systems (AIMLSys 2022), Virtual
[Slides]
Supervised Learning (Decision Trees, Bagging and Boosting algorithms)
Lecture at Amazon ML Summer School 2022, Virtual
[Slides]
Squeezing the last DRiP: AutoML for Cost-constrained Product Classification
Talk at Amazon Research Days 2021 conference, Virtual
[Slides]
Projects
Extending Whisper, OpenAI’s Speech-to-Text Model
Abhishek Divekar, Yosub Jung, Roshni Tayal
Asking the Right Questions: Question Paraphrasing Using Cross-Domain Abstractive Summarization and Backtranslation
Abhishek Divekar, Alex Stoken
Autonomous agents for realtime multiplayer ice-hockey
Abhishek Divekar, Jason Housman, Ankita Sinha, Alex Stoken
SearchDistribute: webscraping search results on an academic budget
Abhishek Divekar
[Code]
Service
- Reviewer for Amazon Machine Learning Conference (AMLC), 2022 (NLP and ML Tools tracks).
- Reviewer for 2nd International Conference on AI-ML Systems (AIMLSys).
- Mentored Applied Science Interns: Mudit Agarwal (2021) and Kush Gupta (2022).
- Took ~90 interviews for Applied Science and Software Development Engineer roles.
- Instructor at Amazon ML Summer School 2022. Press Coverage.
- Amazon ML Summer School 2022 was an initiative which enrolled ~3,000 Indian undergraduate students and helped them learn key ML technologies from Amazon Scientists, making them industry ready for science careers. I taught a module on Supervised Learning (Decision Trees, Bagging and Boosting).