'My profil picture

Hello I'm

KPATOUKPA

Kpodjro

Data Scientist/ Machine Learning Engineer

About me

Machine Learning Engineer with two experiences developing and optimizing LLM models. Expertise in data mining, managing question-answering systems, and fine-tuning language models, having improved prediction scores by 23% during a recent internship. Able to transform complex analyses into practical solutions.
Ready to bring NLP and machine learning skills to meet business needs. I am currently pursuing my master's degree in Data Science with a specialization in machine learning algorithm development, data analysis and decision support system development at Paris Cité University in Paris.

experience icon

End-of-studies internship

ATTIJARIWAFA BANK

March 2024 - August 2024

  • Data extraction from PDF and text files.
  • Embedding text in the Chromadb database for semantic text retrieval.
  • Development of an enhanced RAG system for managing customer queries.
  • Fine-tuning of the Mistral model and comparison of results (against RAG).
  • Implementation of two user interfaces: one for chatbot use, one for real-time supervision based on 4 KPIs.

Technologies :

Mistral 7b, Langchain, torch, streamlit, VSCode, HuggingFace, MongoDB(NoSQL), Jira

experience icon

Data Scientist assistant Internship

MLView Consulting

August 2023 - September 2023

  • Exploratory analysis of marketing data in collaboration with the marketing team.
  • Training of a deep learning model to predict customer interest in specific campaigns or products.
  • Fine-tuning of the LLAMA-2 large language model (LLM) for personalized teaser generation.

Technologies :

Langchain, HuggingFace, LLAMA-2, tensorflow, VSCode, Colab, Git&Github, SQL

education icon

Education

Since September 2024 :

MSc's degree in Machine Learning for Data Science, Paris Cité University, Paris France Major field program

2021 - 2024 :

Engineering degree in Software and intelligent Systems, Abdelmalek Essaadi University, Tangier, Morocco Major field program

2019 - 2021 :

Associate's Degree in Mathematics,Computer Science and physics, Hassan 1st University, Settat Morocco Major field program

dsfgf

Discover my

Research Papers

LLM with graph augmentation for recommandations

Research project (in a team of 4 students) :

10/2024 - 05/2025

Abstract : Enhanced movie recommendations using LLMs (Gemini-1.5, Mistral) to enrich user/item profiles, significantly improving accuracy in LightGCN, MLP, and Matrix Factorization by addressing data sparsity and enabling nuanced personalization. Focused on responsible integration, acknowledging challenges like bias and cost.

Main tasks :

  • Getting started with the dataset & understand the problematic of the need;
  • Study and build a baseline with appropriate models for recommander systems (here: LightGCN, MLP and Matrix Factorization);
  • Do prompt engineering and enrich the dataset by generating significant attributes;
  • Perform MLOps on the new dataset(objective: Predict the rating);
  • Study the impacts and leave recommendations regarding the approach.

Improving Part-of-Speech Tagging in English with TreeTagger

Research project (in a team of 4 students) :

01/2025 - 02/2025

Abstract: This paper explores the use of TreeTagger to accurately identify the different functions of the word "that" in English, such as conjunction, relative pronoun, determiner, or adverb. We first evaluate pre-trained models from the BNC and Penn corpora on a test dataset, then re-train TreeTagger with specific labels derived from the Brown corpus to enhance accuracy. Comparisons with other tools like Stanza and UDpipe are also presented. The main findings demonstrate that re-training with the Brown corpus significantly improves the tool’s performance and ability to distinguish among the various uses of "that".

Main Tasks:

  • Data collection and preparation, including annotating the Brown corpus with specific labels.
  • Initial evaluation of BNC and Penn models for categorizing "that".
  • Re-training TreeTagger with a tailored label set.
  • Comparison of performance with other tools (Stanza and UDpipe).
  • Analyzing the impact of training data size on tagging precision.
  • Proposing methods to better detect and categorize "that" in various linguistic contexts.
Next Section

Explore My

Skills

Data Science Skills

checkmark

Languages

Python, Java, R, C/C++, PL/SQL

IDE

Anaconda, Eclipse, VS Code, RStudio

checkmark

Data Processing

Pandas, Numpy, statsmodels, sklearn, Pyspark

checkmark

Modelization

sklearn, TensorFlow, Keras, pytorch

checkmark

Machine Learning

Supervised,
Unsupervised, Reinforcement, Ensemble Learning

checkmark

Deep Learning

CNN, RNN, LSTM, ANN, TensorFlow, Keras,GNN

checkmark

Computer Vision

OpenCV, Tesseract, KerasCV, pillow

checkmark

NLP

Bert, KerasNLP, LLMAMA-2, Mistral

checkmark

R & R-Shiny

Statistical Modeling, Dashboard Development

checkmark

Databases

MySQL, PostgreSQL, MongoDB, Oracle, Hive

Web and Other Skills

checkmark

Frontend Development

HTML5 & CSS3, Streamlit, Flask, FastAPI, Shiny, Angular (Beginner)

checkmark

Scraping

scrapy, BeautifulSoup, Selenium, pytrend

checkmark

Versioning and Collaboration

Git & GitHub

checkmark

Containerization and Deployment

Docker

checkmark

Work Automation & cloud

Airflow, cron(Linux), AWS, GCP, Vertex AI

checkmark

Projet Management

Gantt Project, Jira

checkmark

Latex

Scientific Document Preparation

checkmark

Power BI & Tableau

Data Visualization, Dashboard Development

Next Section

Browse my

Projects

image-project1

Implementation of a supermarket supply anticipation system based on sales

Sklearn, tensorflow, Keras, matplotlib,SQL

image-project2

calculation of the probability of credit repayment

sklearn, seaborn, xgboost, lightgbm

image-project3

Real-time customer unsubscription prediction

Kafka Stream, PySpark, Sklearn, Flask, Angular, Docker,SQL

image-project1

Implementation of an Image Captioning platform

Sklearn, OpenCV, flask

image-project2

Advanced RAG system with LLAMA-2

Langchain, Streamlit, FAISS, LLMAMA-2

image-project4

Image search by content based on color and shape

arrow

My

Licence and Certifications

Certification 1

Advanced Machine learning

Date of issue: 09/2022

Organism : Huawei

Certification 2

TensorFlow for Computer Vision

Date of issue: 10/2023

Organism : OpenCV University

Certification 3

GenAI for geospacial data

Date of issue: 10/2023

Organism : Nasa Space Challenge

Certification 4

Deep learning on Time series

Date of issue: 09/2023

Organism : Kaggle

Certification 5

Introduction to Deep learning

Date of issue: 10/2023

Organism : Kaggle

arrow

Get in touch

Contact Me