Projects

Tokenization in Turkish and Finnish image

Tokenization in Turkish and Finnish

Scraped text in Turkish and Finnish to study tokenization in agglutinative languages. Evaluated using Word2Vec and Named Entity Recognition sets.


Generating Language Families using Word Similarity image

Generating Language Families using Word Similarity

2024 Summer Project for Tufts Engineering with AI Camp: Using the UDHR, ran a comparison using Levenshtein Distance to "classify" language families.