2019년 9월 28일 토요일

Logstash의 데이터 처리

로그스태시는 멀티 프로세싱을 지원한다. CPU 코어를 풀로 돌릴 수 있다는 얘기. 코어 개수를 정할 수도 있음. 이때 둘 이상의 코어를 사용하면 사소한(?) 문제가 발생한다.


다음은 테스트 로그.
[1] 2011-01-10 00:00:56 W3SVC1 192.168.48.11 GET /view.asp cate_id=2&vod_id=614 80 - 192.168.90.226 - 200 0 0
[2] 2011-01-11 10:00:56 W3SVC1 192.168.48.11 GET / - 80 - 192.168.90.226 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+InfoPath.2;+MS-RTC+EA+2;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729) 200 0 0
[3] 2011-01-11 23:21:42 W3SVC1 192.168.48.11 GET /path/view.asp cate_id='2&vod_id='783|74|800a000d|.:_'sCATE_ID' 80 - 192.168.181.6 - 500 0 0
[4] 2011-01-12 23:59:10 W3SVC1 192.168.48.11 POST /view.asp |107|80040e07|nvarchar__'[Just.For.Fun.By.kook1991]'. 80 - 192.168.98.18 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727)+Havij 500 0 0

다음 설정으로 테스트.
input {
 file {
  path => "d:/iis_sample.log"
  start_position => "beginning"
  sincedb_path => "nul"
 }
}

output {
 stdout {}
}

결과는 이렇다.
PS D:\Elastic\logstash-6.8.2\bin> .\logstash -w 1 -f D:\test.conf
Sending Logstash logs to D:/Elastic/logstash-6.8.2/logs which is now configured via log4j2.properties
[2019-09-28T14:45:41,921][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2019-09-28T14:45:41,955][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.8.2"}
[2019-09-28T14:45:50,843][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-09-28T14:45:51,403][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#"}
[2019-09-28T14:45:51,471][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-09-28T14:45:51,479][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
[2019-09-28T14:45:51,892][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9601}
D:/Elastic/logstash-6.8.2/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
{
    "@timestamp" => 2019-09-28T05:45:52.073Z,
       "message" => "[1] 2011-01-10 00:00:56 W3SVC1 192.168.48.11 GET /view.asp cate_id=2&vod_id=614 80 - 192.168.90.226 - 200 0 0\r",
      "@version" => "1",
          "host" => "MHKANG",
          "path" => "d:/iis_sample.log"
}
{
    "@timestamp" => 2019-09-28T05:45:52.112Z,
       "message" => "[2] 2011-01-11 10:00:56 W3SVC1 192.168.48.11 GET / - 80 - 192.168.90.226 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+InfoPath.2;+MS-RTC+EA+2;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729) 200 0 0\r",
      "@version" => "1",
          "host" => "MHKANG",
          "path" => "d:/iis_sample.log"
}
{
    "@timestamp" => 2019-09-28T05:45:52.114Z,
       "message" => "[3] 2011-01-11 23:21:42 W3SVC1 192.168.48.11 GET /path/view.asp cate_id='2&vod_id='783|74|800a000d|.:_'sCATE_ID' 80 - 192.168.181.6 - 500 0 0\r",
      "@version" => "1",
          "host" => "MHKANG",
          "path" => "d:/iis_sample.log"
}
{
    "@timestamp" => 2019-09-28T05:45:52.114Z,
       "message" => "[4] 2011-01-12 23:59:10 W3SVC1 192.168.48.11 POST /view.asp |107|80040e07|nvarchar__'[Just.For.Fun.By.kook1991]'. 80 - 192.168.98.18 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727)+Havij 500 0 0\r",
      "@version" => "1",
          "host" => "MHKANG",
          "path" => "d:/iis_sample.log"
}

'w' 옵션을 이용해서 코어를 하나만 쓰게 하면 데이터를 순서대로 처리한다. 기본 설정으로 실행하면?
PS D:\Elastic\logstash-6.8.2\bin> .\logstash -f D:\test.conf
Sending Logstash logs to D:/Elastic/logstash-6.8.2/logs which is now configured via log4j2.properties
[2019-09-28T14:22:39,993][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2019-09-28T14:22:40,031][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.8.2"}
[2019-09-28T14:22:49,270][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-09-28T14:22:49,825][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#"}
[2019-09-28T14:22:49,893][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-09-28T14:22:49,901][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
[2019-09-28T14:22:50,366][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9601}
D:/Elastic/logstash-6.8.2/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
{
    "@timestamp" => 2019-09-28T05:22:50.571Z,
          "path" => "d:/iis_sample.log",
          "host" => "MHKANG",
      "@version" => "1",
       "message" => "[1] 2011-01-10 00:00:56 W3SVC1 192.168.48.11 GET /view.asp cate_id=2&vod_id=614 80 - 192.168.90.226 - 200 0 0\r"
}
{
    "@timestamp" => 2019-09-28T05:22:50.610Z,
          "path" => "d:/iis_sample.log",
          "host" => "MHKANG",
      "@version" => "1",
       "message" => "[4] 2011-01-12 23:59:10 W3SVC1 192.168.48.11 POST /view.asp |107|80040e07|nvarchar__'[Just.For.Fun.By.kook1991]'. 80 - 192.168.98.18 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727)+Havij 500 0 0\r"
}
{
    "@timestamp" => 2019-09-28T05:22:50.608Z,
          "path" => "d:/iis_sample.log",
          "host" => "MHKANG",
      "@version" => "1",
       "message" => "[2] 2011-01-11 10:00:56 W3SVC1 192.168.48.11 GET / - 80 - 192.168.90.226 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+InfoPath.2;+MS-RTC+EA+2;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729) 200 0 0\r"
}
{
    "@timestamp" => 2019-09-28T05:22:50.609Z,
          "path" => "d:/iis_sample.log",
          "host" => "MHKANG",
      "@version" => "1",
       "message" => "[3] 2011-01-11 23:21:42 W3SVC1 192.168.48.11 GET /path/view.asp cate_id='2&vod_id='783|74|800a000d|.:_'sCATE_ID' 80 - 192.168.181.6 - 500 0 0\r"
}

여러 개의 코어가 데이터를 나눠서 처리하는 과정에서 순서가 섞임. date 플러그인을 적용해보면 어떨까?
filter {
 dissect {
  mapping => {
   "message" => "%{} %{timestamp} %{+timestamp} %{}"
  }
 }

 date {
  match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
 }
}

결과는 마찬가지.
[2019-09-28T14:58:04,222][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9601}
D:/Elastic/logstash-6.8.2/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
{
       "message" => "[1] 2011-01-10 00:00:56 W3SVC1 192.168.48.11 GET /view.asp cate_id=2&vod_id=614 80 - 192.168.90.226 - 200 0 0\r",
      "@version" => "1",
          "path" => "d:/iis_sample.log",
    "@timestamp" => 2011-01-09T15:00:56.000Z,
          "host" => "MHKANG",
     "timestamp" => "2011-01-10 00:00:56"
}
{
       "message" => "[4] 2011-01-12 23:59:10 W3SVC1 192.168.48.11 POST /view.asp |107|80040e07|nvarchar__'[Just.For.Fun.By.kook1991]'. 80 - 192.168.98.18 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727)+Havij 500 0 0\r",
      "@version" => "1",
          "path" => "d:/iis_sample.log",
    "@timestamp" => 2011-01-12T14:59:10.000Z,
          "host" => "MHKANG",
     "timestamp" => "2011-01-12 23:59:10"
}
{
       "message" => "[3] 2011-01-11 23:21:42 W3SVC1 192.168.48.11 GET /path/view.asp cate_id='2&vod_id='783|74|800a000d|.:_'sCATE_ID' 80 - 192.168.181.6 - 500 0 0\r",
      "@version" => "1",
          "path" => "d:/iis_sample.log",
    "@timestamp" => 2011-01-11T14:21:42.000Z,
          "host" => "MHKANG",
     "timestamp" => "2011-01-11 23:21:42"
}
{
       "message" => "[2] 2011-01-11 10:00:56 W3SVC1 192.168.48.11 GET / - 80 - 192.168.90.226 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+InfoPath.2;+MS-RTC+EA+2;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729) 200 0 0\r",
      "@version" => "1",
          "path" => "d:/iis_sample.log",
    "@timestamp" => 2011-01-11T01:00:56.000Z,
          "host" => "MHKANG",
     "timestamp" => "2011-01-11 10:00:56"
}

뭐 어차피 엘라스틱서치에 저장하면 date 플러그인 파싱값으로 정렬되기 때문에 상관 없음. 하지만 엘라스틱서치에 저장하지 않는다면 문제가 될 수 있다.

관련 글

댓글 없음:

댓글 쓰기

크리에이티브 커먼즈 라이선스