Comparison of ModernBERT-large and ModernBERT-base for Turkish Sentiment Analysis

4 min readDec 21, 2024

In this article, we will compare two variants of the ModernBERT model for Turkish Sentiment Analysis: ModernBERT-base and ModernBERT-large. We will explore their training processes, the impact of model size on performance, and the final results obtained from fine-tuning these models on a Turkish sentiment dataset.

Introduction

Sentiment analysis is an essential task in Natural Language Processing (NLP), where the goal is to classify text based on its sentiment (positive, negative, or neutral). Modern transformer models, such as BERT and its variants, have greatly advanced performance in such tasks. ModernBERT is a variant of BERT designed for enhanced performance. We compare two versions of ModernBERT — ModernBERT-base and ModernBERT-large — on a Turkish sentiment analysis dataset.

Both models were fine-tuned on the winvoker/turkish-sentiment-analysis-dataset, a dataset containing labeled Turkish text with sentiment labels.

Models Used

1. ModernBERT-base:

Parameters: 110 million
Architecture: A smaller variant of the BERT model.
Purpose: Suitable for applications where computational resources are limited, offering a faster and more efficient model.

2. ModernBERT-large:

Parameters: 330 million
Architecture: A larger variant, designed to capture more complex patterns in the data.
Purpose: Ideal for tasks requiring higher model capacity and more compute resources.

Dataset

We used the winvoker/turkish-sentiment-analysis-dataset, which consists of labeled Turkish text with sentiment labels (positive, negative, and neutral). The dataset is designed for sentiment classification in the Turkish language.

Dataset Size: ~15,000 samples
Sentiment Classes: Positive, Negative, Neutral

Training Setup

Both models were fine-tuned with the following training parameters:

Batch Size: 32 (due to memory constraints)
Learning Rate: 8e-5
Epochs: 2
Optimizer: AdamW with beta1=0.9, beta2=0.98
Mixed Precision: For ModernBERT-large, we used bfloat16 precision for efficient training on NVIDIA A100 GPUs.
Evaluation Strategy: Metrics like accuracy and F1-score were used to evaluate model performance.

Training Code Example

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the pre-trained model and tokenizer
checkpoint = "answerdotai/ModernBERT-large"  # or "answerdotai/ModernBERT-base" for the smaller model
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=n_labels, id2label=id2label, label2id=label2id, torch_dtype=torch.bfloat16)
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
# Tokenize input data
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
inputs = inputs.to(device)
# Training and evaluation loop
model.train()
# Add code for training...

Results

Below are the performance metrics from fine-tuning ModernBERT-base and ModernBERT-large on the Turkish sentiment dataset:

ModernBERT-base:

ModernBERT-large:

Key Observations:

Training Loss: The ModernBERT-large model initially performs better in reducing training loss, but as the epochs progress, both models converge towards similar values.
Accuracy: ModernBERT-large consistently outperforms ModernBERT-base in accuracy throughout all epochs.
F1 Score: ModernBERT-large also shows superior performance in F1 score, indicating better precision and recall balance compared to ModernBERT-base.

Comparison and Analysis

Performance

ModernBERT-large shows a clear advantage in terms of training loss, accuracy, and F1 score compared to the smaller ModernBERT-base. This indicates that the larger model can capture more complex patterns and provide better performance in sentiment classification tasks.
ModernBERT-base still performs well, achieving good results, but its smaller capacity limits its ability to model more complex data patterns compared to ModernBERT-large.

Resource Efficiency

ModernBERT-base is more resource-efficient and faster to train compared to ModernBERT-large. If you have limited GPU resources, ModernBERT-base would be a better option, as it requires less GPU memory and training time.
ModernBERT-large, while more powerful, demands more resources, including GPU memory and computational power. It is ideal for larger-scale tasks where performance is critical and computational resources are available.

Use Case and Application

ModernBERT-base is a great choice for applications with limited resources or real-time systems that need fast predictions with reasonable accuracy.
ModernBERT-large is ideal for tasks that require state-of-the-art performance and can handle the additional computational cost, such as enterprise-level applications or projects where the best possible results are needed.

Conclusion

Both ModernBERT-base and ModernBERT-large perform exceptionally well for Turkish sentiment analysis. While ModernBERT-base is more efficient in terms of memory and training time, ModernBERT-large offers better performance in terms of accuracy and F1 score.

For scenarios where high performance is a must, ModernBERT-large is the better option, especially if you have access to powerful hardware like an NVIDIA A100 GPU. However, for more resource-constrained environments, ModernBERT-base offers a great balance between performance and efficiency.

Future Work

Expand the models for multi-class sentiment analysis to classify more granular sentiments (e.g., happy, sad, angry, etc.).
Fine-tune the models on larger and more diverse datasets for improved generalization across various domains.

Final Thoughts

Choosing between ModernBERT-base and ModernBERT-large depends on your application’s requirements. Both models perform well, but if you have the computational power and need higher performance, ModernBERT-large is a better fit. Otherwise, ModernBERT-base offers a solid alternative with reduced resource consumption.

Let me know if you need further adjustments or would like to add more insights to the comparison!