Part-of-Speech Tagging
Part-of-Speech (POS) tagging assigns a grammatical tag to each word in a sentence. The POSTagger
class in Shekar uses a transformer-based model (default: ALBERT) to generate POS tags based on the Universal Dependencies (UD) standard.
Each word is assigned a single tag, such as NOUN
, VERB
, or ADJ
, enabling downstream tasks like syntactic parsing, chunking, and information extraction.
Features
- Transformer-based model for high accuracy
- Universal POS tags following the UD standard
- Easy-to-use Python interface
Example Usage
from shekar import POSTagger
# Initialize the POS tagger
pos_tagger = POSTagger()
text = "نوروز، جشن سال نو ایرانی، بیش از سه هزار سال قدمت دارد و در کشورهای مختلف جشن گرفته میشود."
# Get POS tags
result = pos_tagger(text)
# Print each word with its tag
for word, tag in result:
print(f"{word}: {tag}")