
Tokenization in Turkish and Finnish
Scraped text in Turkish and Finnish to study tokenization in agglutinative languages. Evaluated using Word2Vec and Named Entity Recognition sets.

Generating Language Families using Word Similarity
2024 Summer Project for Tufts Engineering with AI Camp: Using the UDHR, ran a comparison using Levenshtein Distance to "classify" language families.