Friday, 25 December 2015
Friday, 31 July 2015
Run a User Recommendation on Mahout
For a beginner like me, the following steps were useful for running user recommendation algorithm of Mahout.
1- I downloaded the file from http://grouplens.org/datasets/movielens/, the directory contains many files the one which I thought will be useful for me was ratings.data of size 568.3 MB, contianing the fields userId, movieId, rating, timestamp. Mahout's recommenders expect interactions between users and items as input. Every line of the file has the format userID,itemID,value. Here userID and itemID refer to a particular user and a particular item, and value denotes the strength of the interaction (e.g. the rating given to a movie).
(Perform the following steps after login as hduser (user for hadoop cluster))
2- I removed the last field of time stamp from the file (as it was not required for the current recommendation) using the following command and saved it in .csv file
cut --complement -f 4 -d, ratings.data >ratings.csv (To remove 4th column- timestamp from the file)
3- Now I created directory in hadoop file system to store the ratings file using the following command
hadoop fs -mkdir /mahout_data/
4-Now I copied the downloaded file of movie recommendation to hdfs using the following command
hadoop fs -put /home/hduser/mydata/ml-latest/ratings.csv /mahout_data/
5- go to the mahout directory cd /usr/local/mahout/bin/ and issue the following command to run :( The output file should be unique and JAVA_HOME should be properly set)
./mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i hdfs://localhost:9000/mahout_data/ratings.csv -o hdfs://localhost:9000/ratings_test/ --numRecommendations 25
-i hdfs://localhost:9000/mahout_data/ratings.csv - Denotes the input file
-o hdfs://localhost:9000/ratings_test/ -denotes the output file .
recommenditembased - Means we are creating itembased recommendation not user based recommendation, there is a difference between itembased and user based recommendation, a user based recommendation finds similar users , and see what they like, item based recommendation see what the user likes and find similar items, Mahout's item-based recommendation algorithm takes as input customer preferences by item and generates an output recommending similar items with a score indicating whether a customer will "like" the recommended item.
Choosing a similarity measure for use in a production environment is something that requires careful testing, evaluation and research. For our example purposes, here I used Mahout similarity classname called SIMILARITY_LOGLIKELIHOOD.
6- It will run for a couple of minutes you can see your output from web interface as well
7- You can check the output file it will contain two columns: the userID and an array of itemIDs and scores.
References :
http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html
http://grouplens.org/datasets/movielens/
http://info.mapr.com/rs/mapr/images/PracticalMachineLearning.pdf
1- I downloaded the file from http://grouplens.org/datasets/movielens/, the directory contains many files the one which I thought will be useful for me was ratings.data of size 568.3 MB, contianing the fields userId, movieId, rating, timestamp. Mahout's recommenders expect interactions between users and items as input. Every line of the file has the format userID,itemID,value. Here userID and itemID refer to a particular user and a particular item, and value denotes the strength of the interaction (e.g. the rating given to a movie).
(Perform the following steps after login as hduser (user for hadoop cluster))
2- I removed the last field of time stamp from the file (as it was not required for the current recommendation) using the following command and saved it in .csv file
cut --complement -f 4 -d, ratings.data >ratings.csv (To remove 4th column- timestamp from the file)
3- Now I created directory in hadoop file system to store the ratings file using the following command
hadoop fs -mkdir /mahout_data/
4-Now I copied the downloaded file of movie recommendation to hdfs using the following command
hadoop fs -put /home/hduser/mydata/ml-latest/ratings.csv /mahout_data/
5- go to the mahout directory cd /usr/local/mahout/bin/ and issue the following command to run :( The output file should be unique and JAVA_HOME should be properly set)
./mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i hdfs://localhost:9000/mahout_data/ratings.csv -o hdfs://localhost:9000/ratings_test/ --numRecommendations 25
-i hdfs://localhost:9000/mahout_data/ratings.csv - Denotes the input file
-o hdfs://localhost:9000/ratings_test/ -denotes the output file .
recommenditembased - Means we are creating itembased recommendation not user based recommendation, there is a difference between itembased and user based recommendation, a user based recommendation finds similar users , and see what they like, item based recommendation see what the user likes and find similar items, Mahout's item-based recommendation algorithm takes as input customer preferences by item and generates an output recommending similar items with a score indicating whether a customer will "like" the recommended item.
Choosing a similarity measure for use in a production environment is something that requires careful testing, evaluation and research. For our example purposes, here I used Mahout similarity classname called SIMILARITY_LOGLIKELIHOOD.
6- It will run for a couple of minutes you can see your output from web interface as well
7- You can check the output file it will contain two columns: the userID and an array of itemIDs and scores.
References :
http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html
http://grouplens.org/datasets/movielens/
http://info.mapr.com/rs/mapr/images/PracticalMachineLearning.pdf
Mahout Installation
I wanted to try Data mining on big data, so I tried installing Mahout for that, here are the steps which I followed for successful installation of Mahout.
Prerequisite to install Mahout is - JDK, Maven and Hadoop cluster.
Prerequisite to install Mahout is - JDK, Maven and Hadoop cluster.
- sudo apt-get install maven
- Download the latest distribution of mahout from the site http://www.apache.org/dyn/closer.cgi/lucene/mahout/
- unzip and copy this to the desired location
- issue ls to check the packages inside it
- cd /usr/local/mahout/distribution
- sudo mvn install (Install maven 3.0.1 or above for Mahout .20 distribution else it will throw some error)
- Your installation is complete if you see the following screen
Tuesday, 7 July 2015
Resolve the installation problem of "rmongodb"
Being a new user of R as well as MongoDB, I wanted to make a mongoDB database connection with R but had to struggle before I could successfully establish the connection.
The version of R what I was using was 3.0.1, being old version whenever I was trying to install install.packages("rmongodb"), I was getting some error. Ultimately I had to upgrade the version of R using the following step and then I was able to install rmongodb package.
- sudo gedit /etc/apt/sources.list
- add the following line as I am using the version 14.04.2 (use the command lsb_release -c to see the name)
- deb http://cran.cnr.berkeley.edu/bin/linux/ubuntu trusty/
- gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
- sudo apt-get update
- sudo apt-get upgrade
Now use the following command to install mongodb package and connect to MongoDB
- install.packages("rmongodb")
- library(rmongodb)
To connect to local mongoD
- mongo <- mongo.create()
Will keep on posting as I proceed towards using R, Shiny and MongoDB.
Thursday, 21 May 2015
Resolve the connection problem in mongoDB
I was trying to set up admin password in mongoDB by using admin database, but something went wrong and I started getting error " couldn't connect to server 127.0.0.1:27017 at src/mongo/shell/mongo.js:145exception: connect failed".
After giving some trials I rectified using the following commands :
Step 1: Remove lock file.
sudo rm /var/lib/mongodb/mongod.lock
Step 2: Repair mongodb.
sudo mongod --repair
Step 3: start mongodb.
sudo start mongodb
or
sudo service mongodb start
Step 4: Check status of mongodb.
sudo status mongodb
or
sudo service mongodb status
Step 5: Start mongo console.
mongo
Reference : http://stackoverflow.com/questions/12831939/couldnt-connect-to-server-127-0-0-127017/17793856#17793856
Tuesday, 10 March 2015
Install Microsoft Office on UBUNTU
I was trying to install Microsoft office 2K7 on my ubuntu 14.10, last time I installed on Ubuntu 12, simply by using wine windows program loader, but this time I was getting some error "newer windows needed". So I had no other option but to try something else .
I installed using playonlinux
sudo apt-get install playonlinux
sudo apt-get install winbind
start playonlinux
You will see the following screen
I installed using playonlinux
- Click on Install
- Choose Micro soft office version which you want to install
- Choose the location/ correct path
- Enter the licence key the installation will start
Thursday, 15 January 2015
Subscribe to:
Posts (Atom)