본문 바로가기

Computer83

elasticsearch thread pool http://www.elastic.co/guide/en/elasticsearch/reference/1.x/modules-threadpool.html node는 thread memory consumption의 관리를 위해 several thread pools을 가진다.이 pool들은 queues를 가지고, request를 폐기하는대신 대기할 수 있다. index : fixed # of available processors, queue_size of 200search : fixed 3x # of available processors, queue_size of 1000suggest : fixed # of available processors, queue_size of 1000get : fixed # of av.. 2015. 5. 7.

hdfs 파일 라인 수 계산하기 일단 내가 아는 방법 hadoop fs -cat | wc -l 이렇게 세면 한참 걸린다... 그래서 찾아본 방법 1. MR로 라인 수 계산하고, $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar -Dmapred.reduce.tasks=100 -input -output -mapper /bin/cat -reducer "wc -l" 이렇게 하면 MR을 사용해서 라인을 계산하게 된다.이렇게 계산된 파일은 local로 다시 받아와서, awk를 이용해서 계산하면 된다. 2. hdfs 상에 계산된 파일을 local로 복사hadoop fs -getmerge 3. awk 이용해서 최종 결과 출력awk '{s += $1}END { print s }' 참고: ht.. 2015. 5. 6.

doc values http://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html 인덱스 시점에 disk에 in-memory fielddata를 저장하는 방법doc value는 in-memory fielddata에 비해 10~25% 느리다.하지만, 2가지 장점이 있다.1) heap memory 대신 disk에 있다. 더 적은 heap을 사용할 수 있고, gc의 속도를 향상 시킬 수 있고, 일관성과 노드의 안정성을 향상 시킬 수 있다.2) doc values는 index time에 빌드된다. trade-off는 larger index size와 약간 느려진 fielddata access이다.doc values는 상당히 효율적이다. 그래서 많은 querie.. 2015. 5. 6.

eager, eager global ordinals http://www.elastic.co/guide/en/elasticsearch/guide/current/preload-fielddata.html ES는 기본적으로 fielddata를 lazily load 한다.ES는 particular field에 fielddata가 필요한 query가 생기면, 전체 field를 각 index의 segement에 대해 메모리로 load한다. 작은 segments에서는 이 시간이 별로 안 걸리지만, 5GB segments는 10GB fielddata를 메모리로 올려야 한다. 이 과정은 수십초가 걸린다. 지연에 대비할 수 있는 3가지 방법이 있다.1) Eagerly load fielddata2) Eagerly load global ordinals3) Prepopulate .. 2015. 5. 6.

이전 1 ··· 7 8 9 10 11 12 13 ··· 21 다음

티스토리툴바