Sunday, May 3, 2015

experiments with twitter nlp

I wanted a good tool for named entity extraction on tweets. I came across this : https://github.com/aritter/twitter_nlp.

Here is how I installed it : 
2. unzip master
3. cd master
4. yum install glibc-static
5. sh build.sh

To run it, go to the directory containing python folder. 
1. export TWITTER_NLP=./ 
2. cat test.1k.txt | python python/ner/extractEntities2.py

I made another sample file, test2, with these 2 tweets

I live in Jodhpur.
usgs reports a m0.46 #earthquake 13km nw of jodhpur, rajasthan on 5/1/15 @ 15:50:46 utc http://t.co/mqgsvgnkbo #quake

and then :
 cat test2 | python python/ner/extractEntities2.py --classify --pos --event

Results:
I/O/PRP/O live/O/VBP/B-EVENT in/O/IN/O Jodhpur/B-person/NNP/O ./O/./O
usgs/O/NNP/O reports/O/VBZ/B-EVENT a/O/DT/O m/O/NN/O 0.46/O/HT/O #earthquake/O/HT/O 13km/O/HT/O nw/O/NN/O of/O/IN/O jodhpur/O/NN/O ,/O/,/O rajasthan/O/VBN/O on/O/IN/O 5/1/15/O/CD/O @/O/IN/O 15:50/O/CD/O :/O/:/O 46/O/CD/O utc/O/:/O http://t.co/mqgsvgnkbo/O/URL/O #quake/O/HT/O

So Jodhpur is classified as a person though it should be a location.

I don't know what am I doing wrong here?

Blog Archive