Skip to content

Part-of-Speech Tagging

Notebook Open In Colab

Part-of-Speech (POS) tagging assigns a grammatical tag to each word in a sentence. The POSTagger class in Shekar uses a transformer-based model (default: ALBERT) to generate POS tags based on the Universal Dependencies (UD) standard.

Each word is assigned a single tag, such as NOUN, VERB, or ADJ, enabling downstream tasks like syntactic parsing, chunking, and information extraction.

Features

  • Transformer-based model for high accuracy
  • Universal POS tags following the UD standard
  • Easy-to-use Python interface

Example Usage

from shekar import POSTagger

# Initialize the POS tagger
pos_tagger = POSTagger()

text = "نوروز، جشن سال نو ایرانی، بیش از سه هزار سال قدمت دارد و در کشورهای مختلف جشن گرفته می‌شود."

# Get POS tags
result = pos_tagger(text)

# Print each word with its tag
for word, tag in result:
    print(f"{word}: {tag}")
نوروز: PROPN
،: PUNCT
جشن: NOUN
سال: NOUN
نو: ADJ
ایرانی: ADJ
،: PUNCT
بیش: ADJ
از: ADP
سه: NUM
هزار: NUM
سال: NOUN
قدمت: NOUN
دارد: VERB
و: CCONJ
در: ADP
کشورهای: NOUN
مختلف: ADJ
جشن: NOUN
گرفته: VERB
می‌شود: VERB
.: PUNCT