Feeds:
Posts
Comments

Error:
You get an error that “remote host identification has changed” when you try to ssh to localhost.

Possible Problem:
You have moved your single node cluster from one machine in the Berry Patch to another. The name localhost thus is pointing to a new machine, and your ssh client thinks that it might be a man-in-the-middle attack.

Possible Solution:
You can ask your login to skip checking the validity of localhost. You do this by setting NoHostAuthenticationForLocalhost to yes in ~/.ssh/config. You can accomplish this with the following command:

echo “NoHostAuthenticationForLocalhost yes” >>~/.ssh/config

Error:
You get a NoRouteToHostException in your logs or in stderr output from a command.

Possible Problem:
One of your nodes cannot be reached correctly. This may be a firewall issue

Possible Solution:
The only workaround is to pick a new node to replace the unreachable one.

Error:
You get an error that you cluster is in “safe mode”

Possible Problem:
Your cluster enters safe mode when it hasn’t been able to verify that all the data nodes necessary to replicate your data are up and responding.

Possible Solution:
First, wait a minute or two and then retry your command. If you just started your cluster, it’s possible that it isn’t fully initialized yet.
If waiting a few minutes didn’t help and you still get a “safe mode” error, check your logs to see if any of your data nodes didn’t start correctly (either they have Java exceptions in their logs or they have messages stating that they are unable to contact some other node in your cluster). If this is the case you need to resolve the configuration issue (or possibly pick some new nodes) before you can continue.

Error:
Error: unable to create new native thread Error initializing attempt_201111090003_0013_r_000000_0: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:614) at java.lang.UNIXProcess$1.run(UNIXProcess.java:157) at java.security.AccessController.doPrivileged(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:119) at java.lang.ProcessImpl.start(ProcessImpl.java:81) at java.lang.ProcessBuilder.start(ProcessBuilder.java:468) at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:750) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1664) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1629)

Solution:
When you have this kind of erros when runnning hadoop jobs, there might be a numer of possible reasons thanks to the feeble implementation of Hadoop. One possible reason is because in your MapReduce programs you open too much processes exceeding the default setting of your OS, for example, the default number is 1024 (you can check this number by executing ‘ulimit -u’). A perfect example of using many processes would be such a case, in which you want control the output file name based on key-value pair in the reduce stage. To solve this problem, you need to modify some configuration files to raise up the maximum process number you can use, which can be done by editing /etc/security/limits.conf. Simply adding the following two lines to the llimits.conf to set the 100000 as the maximum number of processs in your system for user hadoop.

hadoop soft nproc 100000
hadoop hard nproc 100000

Error:
WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
enough replicas

Possible causes:
* Fewer DataNodes available than the replication factor of the blocks
* DataNodes do not have enough xciever threads
* Default is 256 threads to manage connections
* Note: yes, the configuration option is misspelled!

Possible resolutions:
* Increase dfs.datanode.max.xcievers to 4096
* Check replication factor

Error:
INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for
output of task:

Cause:
* Reducers are failing to fetch intermediate data from a
TaskTracker where a Map process ran
* Too many of these failures will cause a TaskTracker to be
blacklisted

Possible resolutions:
* Increase tasktracker.http.threads
* Decrease mapred.reduce.parallel.copies
* Upgrade to CDH3u2
* The version of Jetty (the Web server) in earlier versions of the TaskTracker was prone to fetch failures

Error:
ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed:
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.TaskInProgress.(TaskInProgress.java:122)
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:653)
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3965)
at org.apache.hadoop.mapred.EagerTaskInitializationListener
$InitJob.run(EagerTaskInitializationListener.java:79)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Cause:
JobTracker has exceeded allocated memory

Possible resolutions:
* Increase JobTracker’s memory allocation
* Reduce mapred.jobtracker.completeuserjobs.maximum
* Amount of job history held in JobTracker’s RAM

Follow

Get every new post delivered to your Inbox.