| BML-S | Help Login

Last entries

  1. yarn|mapreduce2 > Task attempt retry number in Hadoop Two | Jian's Blog - [s]

    public static final String MAP_MAX_ATTEMPTS = "mapreduce.map.maxattempts"; public static final String REDUCE_MAX_ATTEMPTS = "mapreduce.reduce.maxattempts";

  2. 20170621.1752
  3. How to enable user impersonation for HIVE interpreter in Zeppelin - [s]

  4. 20170621.1120
  5. How to disable SPNEGO authentication for Solr - Hortonworks - [s]

  6. 20170621.1109
  7. Falcon fails to renew kerberos ticket - Hortonworks - [s]

  8. 20170620.1008
  9. Ambari Metrics (ams) > blacklist whitelist metric service / component - [s]

  10. 20170615.1224
  11. Falcon fails to renew kerberos ticket - Hortonworks - [s]

  12. 20170614.0729
  13. start stop HDP Service - [s]

  14. 20170613.1921
  15. (ranger) kms | Transparent Encryption in HDFS - [s]

  16. 20170613.1915
  17. Hadoop Dev | Migrate from Hadoop KMS to Ranger KMS - Hadoop Dev - [s]

  18. 20170613.1308
  19. Recreate Ambari's CA, used to sign SSL certificates for 2-way SSL - Hortonworks - [s]

  20. 20170612.1643
  21. NiFi Debugging Tutorial - Hortonworks - [s]

  22. 20170609.0853
  23. Import hive metadata into Atlas - Hortonworks - [s]

  24. 20170607.1420
  25. YARN > CPU/vcore allocation - [s]

    yarn.scheduler.maximum-allocation-vcores controls the maximum vcores that any submitted job can request. yarn.nodemanager.resource.cpu-vcores, on the other hand, controls how many vcores can be scheduled on a particular NodeManager instance.    So yarn.nodemanager.resource.cpu-vcores can vary from host to host (NodeManager to NodeManager), while yarn.scheduler.maximum-allocation-vcores is a global property of the scheduler.

  26. 20170605.1054
  27. HDFS > how group owner is decided in HDFS - [s]

    When a file or directory is created, its owner is the user identity of the client process, and its group is the group of the parent directory (the BSD rule).

  28. 20170602.0814
  29. Agenda – Munich 2017 | DataWorks (hadoop) Summit - [s]

  30. 20170602.0800
  31. Ranger > Configuring Ranger Usersync with AD/LDAP for a common usecase of AD/LDAP user and group - [s]

    From HDP 2.6 onwards Ranger Usersync supports “Incremental Sync” and is enabled by default. For clusters that are upgraded from older version to 2.6, “Incremental Sync” is disabled. When “Incremental sync” is enabled, “Enable Group Sync” is set to “true” by default and the properties under “Group Configs” is mandatory.

  32. 20170602.0754
  33. hbase > Time-Delayed HBase Performance Degradation with Java 7 - [s]

    Code compilation is a tool designed to help long-lived Java applications run fast without negatively affecting the start-up time of short-lived applications. After methods are invoked, they are compiled from Java byte code into machine code and cached by the JVM. Subsequent invocations of a method which are cached can directly invoke the machine code instead of having to deal with Java byte-code. export HBASE_SERVER_OPTS="$HBASE_SERVER_OPTS -XX:ReservedCodeCacheSize=256m"

  34. 20170601.0751
  35. Kafka 0.9 Configuration Best Practices - Hortonworks - [s]

    Example: In kafka-env.sh add following settings. export KAFKA_HEAP_OPTS="-Xmx4g -Xms4g" export KAFKA_JVM_PERFORMANCE_OPTS="-XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80"

  36. 20170531.0814
  37. [OOZIE-1978] Forkjoin validation code is ridiculously slow in some cases (performance) - [s]

    2017-05-20 10:08:04,375 ERROR CoordStatusTransitXCommand:517 - SERVER[xxxx.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] XException, org.apache.oozie.command.CommandException: E0606: Could not get lock [coord_status_transit_66e75090-b12c-48ca-b244-ce4677f505b0], timed out [0]ms at org.apache.oozie.command.XCommand.acquireLock(XCommand.java:220) at org.apache.oozie.command.XCommand.call(XCommand.java:264) at org.apache.oozie.service.StatusTransitService$StatusTransitRunnable.coordTransit(StatusTransitService.java:192) at org.apache.oozie.service.StatusTransitService$StatusTransitRunnable.run(StatusTransitService.java:97) at org.apache.oozie.service.SchedulerService$2.run(SchedulerService.java:175) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) WORKAROUND: set oozie.validate.ForkJoin to false in the oozie-site.xml file

  38. 20170530.0757
  39. Zookeeper Health Checks - Hortonworks - [s]

  40. 20170529.1015
  41. yarn/hive/mr2 | Behavior of the parameter "mapred.min.split.size" (number of mapper) - [s]

    # Note: should be mapreduce.input.fileinputformat.split.maxsize / minsize max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size)) In your case it will be:- split size=max(128,min(Long.MAX_VALUE(default),64)) So above inference:- each map will process 2 hdfs blocks(assuming each block 64MB): True There will be a new division of my input file (previously included HDFS) to occupy blocks in HDFS 128M: False but making the minimum split size greater than the block size increases the split size, but at the cost of locality.

  42. 20170529.0843
  43. HDP 2.6+ - Configuring Zeppelin for Active Directory user authentication & authorization - Hortonworks - [s]

  44. 20170529.0838
  45. Ranger > How to Configure Ranger User Sync (usersync, group) - [s]

  46. 20170529.0835
  47. [HIVE-11402] HS2 - add an option to disallow parallel query execution within a single Session - [s]

    If beeline or hcat or jdbc hang may worth try: hive.server2.parallel.ops.in.session=true to hive-site

  1. RHEL7/centos7 AD Join - SSSD troubleshoot/debug/log - [s]

    # this didn't work ldap_idmap_range_min = 1000 ldap_idmap_range_max = 2100000000 ldap_idmap_range_size = 100000000 ldap_child.log shows (Fri Jun 23 07:12:15 2017) [[sssd[ldap_child[21507]]]] [sss_child_krb5_trace_cb] (0x4000): [21507] 1498201935.613113: Received error from KDC: -1765328378/Client not found in Kerberos database (Fri Jun 23 07:12:15 2017) [[sssd[ldap_child[21507]]]] [ldap_child_get_tgt_sync] (0x0010): Failed to init credentials: Client 'host/testuser-node16.localdomain@SUPPORT.COM' not found in Kerberos database [root@testuser-node16 ~]# hostname -f node16.localdomain [root@testuser-node16 ~]# hostname testuser-node16.localdomain [root@testuser-node16 ~]# hostnamectl set-hostname node16.localdomain [root@testuser-node16 ~]# hostnamectl Static hostname: node16.localdomain Icon name: computer-vm Chassis: vm Machine ID: b80edaccf20d596cb16958716152347e Boot ID: 8fc120ee35ca42938709df4ee533a0cb Virtualization: kvm Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-123.9.3.el7.x86_64 Architecture: x86-64 [root@testuser-node16 ~]# hostname node16.localdomain [root@testuser-node16 ~]# service sssd restart Redirecting to /bin/systemctl restart sssd.service [root@testuser-node16 ~]# id test uid=1594001366(test) gid=1594000513(domain_users) groups=1594000513(domain_users),1594001232(security)

  2. 20170621.1151
  3. curl use cacert or client certificate to connect to https server - [s]

  4. 20170619.1731
  5. java|ssl > Embedded Jetty with client certificates - [s]

    Also: https://github.com/apache/falcon/blob/master/prism/src/main/java/org/apache/falcon/util/SecureEmbeddedServer.java Properties properties = StartupProperties.get(); SslSocketConnector connector = new SslSocketConnector(); connector.setPort(port); connector.setHost(""); connector.setKeystore(properties.getProperty("keystore.file", System.getProperty("keystore.file", "conf/prism.keystore"))); connector.setKeyPassword(properties.getProperty("keystore.password", System.getProperty("keystore.password", "falcon-prism-passwd"))); connector.setTruststore(properties.getProperty("truststore.file", System.getProperty("truststore.file", "conf/prism.keystore"))); connector.setTrustPassword(properties.getProperty("truststore.password", System.getProperty("truststore.password", "falcon-prism-passwd"))); connector.setPassword(properties.getProperty("password", System.getProperty("password", "falcon-prism-passwd"))); connector.setWantClientAuth(true);

  6. 20170619.1714
  7. java|keytool > how to change the password by recreating (note: need to provide srcstorepass too) - [s]

  8. 20170605.1908
  9. spark|zeppelin|livy > How to get Spark Interpreter working with custom python configuration - [s]

    Ambari => Spark => Custom livy-conf livy.spark.master=yarn-cluster # didn't work Ambari => Zeppelin Notebook => Advanced zeppelin-env # Pyspark (supported with Spark 1.2.1 and above) # To configure pyspark, you need to set spark distribution's path to 'spark.home' property in Interpreter setting screen in Zeppelin GUI # path to the python command. must be the same path on the driver(Zeppelin) and all workers. # export PYSPARK_PYTHON export PYSPARK_DRIVER_PYTHON="/opt/Python-2.7.6/python" export PYSPARK_PYTHON="/opt/Python-2.7.6/python" # didn't work livy.spark.executorEnv.PYSPARK_PYTHON=/opt/Python-2.7.6/python livy.spark.yarn.appMasterEnv.PYSPARK_PYTHON=/opt/Python-2.7.6/python # didn't work %livy.pyspark import os,sys os.environ["PYSPARK_PYTHON"]="/opt/Python-2.7.6/python" os.environ["PYSPARK_DRIVER_PYTHON"]="/opt/Python-2.7.6/python" print(sys.version) # Seems to work? At least yarn app log sets env variable "export PYSPARK_PYTHON="/opt/Python-2.7.6/python"" %livy.pyspark #dir() conf.set('spark.yarn.appMasterEnv.PYSPARK_PYTHON', '/opt/Python-2.7.6/python') conf.set('spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON', '/opt/Python-2.7.6/python') Also, changing spark-env template works (of course) Ref: https://issues.apache.org/jira/browse/ZEPPELIN-2195 LIVY-159 SPARK-13081 SPARK-16110 http://spark.apache.org/docs/latest/configuration.html#available-properties Note: When running Spark on YARN in cluster mode, environment variables need to be set using the spark.yarn.appMasterEnv.[EnvironmentVariableName] property in your conf/spark-defaults.conf file. Environment variables that are set in spark-env.sh will not be reflected in the YARN Application Master process in cluster mode. See the YARN-related Spark Properties for more information.

  10. 20170605.1706
  11. How To Set Up Python 2.7.6 and 3.3.3 on CentOS 6.4 | DigitalOcean - [s]

  12. 20170605.1011
  13. linux|rhel|centos > Unable to bind to a port even when lsof/netstat shows no usage - [s]

  14. 20170602.1520
  15. linux|centos|rhel > how to find which package depend or depended / dependant of what - [s]

    rpm -q --whatrequires sqlite repoquery --requires --resolve hive_2_6_1_0_129-1.2.1000. hive_2_6_1_0_129-0:1.2.1000. bash-0:4.1.2-48.el6.x86_64 zookeeper_2_6_1_0_129-0: ranger_2_6_1_0_129-hive-plugin-0: hdp-select-0: coreutils-0:8.4-46.el6.x86_64 tez_2_6_1_0_129-0: hadoop_2_6_1_0_129-client-0: hive_2_6_1_0_129-jdbc-0:1.2.1000. atlas-metadata_2_6_1_0_129-hive-plugin-0:

  16. 20170602.1051
  17. apache2 > web proxy for websocket mod_proxy_wstunnel - [s]

  18. 20170602.0752
  19. [JDK-8051955] Code cache flushing causes stop in compilation and high CPU - Java Bug System - [s]

  20. 20170531.1713
  21. java|jdb > JDI(Java Debug Interface) onthrow and launch - [s]

  22. 20170531.0803
  23. mysql > DETERMINISTIC, NO SQL, or READS SQL DATA in its declaration and binary logging is enabled - [s]

    2017-05-30 10:14:07,068 [JISQL] /usr/java/latest/bin/java -cp /usr/hdp/* org.apache.util.sql.Jisql -driver mysqlconj -cstring jdbc:mysql://lvshdc18en0009.lvs.paypalinc.com:3115/ranger -u 'rangerdba' -p '********' -noheader -trim -c \; -input /usr/hdp/current/ranger-admin/db/mysql/patches/007-updateBlankPolicyName.sql Error executing: CREATE FUNCTION `getTempPolicyCount`(assetId bigint, resId bigint) RETURNS int(11) BEGIN DECLARE tempPolicyCount int default 1; DECLARE dbResourceId bigint; DECLARE exitLoop int DEFAULT FALSE; DECLARE policyList CURSOR FOR SELECT id from x_resource where asset_id = assetId; DECLARE CONTINUE HANDLER FOR NOT FOUND SET exitLoop = true; OPEN policyList; readPolicy : LOOP FETCH policyList into dbResourceId; IF exitLoop THEN set tempPolicyCount = tempPolicyCount + 1; LEAVE readPolicy; END IF; IF (resId = dbResourceId) THEN LEAVE readPolicy; END IF; set tempPolicyCount = tempPolicyCount + 1; END LOOP; CLOSE policyList; RETURN tempPolicyCount; END java.sql.SQLException: This function has none of DETERMINISTIC, NO SQL, or READS SQL DATA in its declaration and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable) SET GLOBAL log_bin_trust_function_creators = 1;

  24. 20170530.0808
  25. Create a Grok Pattern - [s]

  26. 20170528.0838
  27. Java > jdb attach to a running process - [s]

    java -Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n,onthrow=java.io.IOException someClass jdb -attach 8000 Ref: http://docs.oracle.com/javase/7/docs/technotes/guides/jpda/conninv.html#Invocation

[s] hadoop (24)
[s] public (14)