Resume ⇔ Job Matching Engine

During my time as a Junior Python Developer at Smartbytes Ltd. (Bangladesh), I developed this project to tackle a common challenge: making the initial screening of resumes faster and more effective. My goal was to automate the tedious manual keyword searching and provide recruiters with a ranked list of the most suitable candidates.

Key Contributions & What I Built:

  • Comprehensive End-to-End Data Pipeline:
    • I engineered a system to scrape and normalize over 15,000+ public resumes and job advertisements using Selenium and BeautifulSoup.
    • This structured text data was then stored efficiently in PostgreSQL via SQLAlchemy, creating a reliable foundation for repeatable experiments and model training.
  • Sophisticated Feature Engineering for Deeper Insights:
    • Leveraging spaCy and NLTK, I implemented Named Entity Recognition (NER) to extract key information like skills, educational background, locations, and years of experience from the text.
    • I then generated TF-IDF vectors and engineered custom features such as “skill-overlap” and “seniority-gap” scores to capture more nuanced relationships between resumes and job descriptions.
  • High-Performance Multi-Class Matching Model:
    • I rigorously benchmarked several models, including Logistic Regression and Random Forest. Ultimately, a LinearSVC model proved most effective, boosting the macro-F1 score by approximately 40% compared to the baseline.
    • The model achieved an impressive top-5 match accuracy of 92% when tested on a held-out set with 20 distinct job categories.
  • Efficient FastAPI Microservice for Real-Time Matching:
    • I developed a REST API using FastAPI, featuring a /match/ endpoint that accepts a resume and job description, returning a ranked list of candidates with corresponding confidence scores.
    • To support the frontend team of six developers, I ensured Swagger documentation was automatically generated for easy integration.
  • Streamlined Containerized Deployment:
    • I containerized the application using Docker and set up a CI/CD pipeline with GitHub Actions to automatically push images to AWS ECR. The application was deployed as an AWS ECS service, configured to auto-scale based on CPU load.

What This Project Highlights About My Abilities:

  • Practical NLP Application: Successfully applying techniques like NER, TF-IDF, and text classification (LinearSVC) to solve a real-world business problem.
  • Full-Cycle Machine Learning Development: From data collection and preprocessing to feature engineering, model training, evaluation, and deployment.
  • Backend & API Development: Proficiency in building robust and scalable microservices with Python and FastAPI.
  • Data Engineering & Management: Experience with web scraping (Selenium, BeautifulSoup), data storage (PostgreSQL, SQLAlchemy), and building data pipelines.
  • DevOps & Cloud Deployment: Skills in containerization (Docker) and CI/CD (GitHub Actions) for deployment on AWS (ECR, ECS).
  • Collaboration & Impact: Delivering a tool that directly supported a development team and aimed to improve recruiter efficiency.

Key Technologies I Employed: Python, spaCy, NLTK, scikit-learn, FastAPI, SQLAlchemy, TF-IDF, LinearSVC, BeautifulSoup, Selenium, PostgreSQL, Docker, AWS ECS, GitHub Actions.