ModernBERT-TR: A Modern Encoder Foundation Model for Turkish

150M

Parameters

144.4B

Tokens

Overview

ModernBERT-TR is a 150M-parameter encoder pretrained from scratch on 144.4 billion Turkish tokens. The model uses the ModernBERT architecture: rotary position embeddings, gated linear units, alternating local-global attention, and sequence-packed training.

Under frozen-encoder linear probing, ModernBERT-TR achieves a 60.2% average across 11 Turkish NLP tasks, surpassing the next-best model by 13.1% relative. Under full fine-tuning on the 28-task TabiBench benchmark, it scores 77.28, matching TabiBERT within 0.30 points and leading in 5 of 8 categories despite training on 7x fewer tokens.

Benchmark Results

Frozen-encoder linear probing across 11 Turkish NLP tasks spanning classification, regression, and token classification.

Per-Task Breakdown

Best scores per task shown in bold. ModernBERT-TR leads on 5 of 11 tasks.

TabiBench Fine-Tuning

Full fine-tuning evaluation on the 28-task TabiBench benchmark with grid search over learning rate, weight decay, and batch size.

ModernBERT-TR (77.28) matches TabiBERT (77.58) and leads in 5 of 8 categories.

Training Data and Tokenizer

We combine FineWeb-2 Turkish (41.2B tokens) with BertTurk Corpus upsampled 5x (31.0B tokens), yielding 72.2B tokens per epoch trained for 2 full epochs. A custom 50K WordPiece tokenizer is trained on Turkish data, achieving strong compression with efficient vocabulary usage.

Turkish compression ratio across tokenizer configurations.

Methodology

ModernBERT-TR follows the ModernBERT-base architecture (22 layers, 768 hidden, 150M parameters) and was pretrained with systematic ablations to select the final configuration.

Tokenizer Ablation Six configurations across vocabulary sizes (32K to 128K) and training corpora evaluated on compression efficiency.
Hyperparameter Search Learning rate, batch size, schedule parameters, and data mixing ratios varied. Only LR and data mixture significantly affect quality.
Data Curation Mixing ratio of BertTurk Corpus (5x upsampling) identified as a primary driver of downstream performance.
Evaluation Protocol Frozen-encoder linear probing isolates representation quality from fine-tuning optimization choices.

Acknowledgments

We thank Answer.AI for the ModernBERT codebase, HuggingFace for the FineWeb-2 dataset, and Schweter for the BertTurk Corpus. This work was supported by Yildiz Technical University Scientific Research Projects Coordination Unit under project number FDK-2024-6070. This study has also been supported by the Scientific and Technological Research Council of Turkey (TUBITAK) Grant No: 124E055.

Citation

@article{alkurdi2025modernberttr, title={ModernBERT-TR: A Modern Encoder Foundation Model for Turkish}, author={Alkurdi, Besher and Kesgin, Himmet Toprak and Yuce, Muzaffer Kaan and Amasyali, Mehmet Fatih}, year={2025} }