2019년 8월 18일 일요일

Logstash 필터 drop

drop 필터의 percentage 옵션이 잘 동작하지 않는다는 질문을 받았다. 다음은 테스트 설정.
input {
 file {
  path => "d:/test.log"
  start_position => "beginning"
  sincedb_path => "nul"
 }
}

filter {
 #if "test" in [message] { drop {} }
 #if [message] =~ "test" { drop {} }
 #if [message] !~ "string" { drop {} }

 #if "test" in [message] { drop { percentage => 50 } }
 #if [message] =~ "test" { drop { percentage => 50 } }
 #if [message] !~ "string" { drop { percentage => 50 } }

output {
 stdout {}
}

다음 필터식은 'test' 문자열 데이터를 모두 버리는 설정이며, 'percentage' 옵션 기본값이 100이기 때문에 'drop {}'과 'drop { percentage => 100 }' 구문은 결과적으로 똑같이 동작한다.
filter {
 #if "test" in [message] { drop {} }
 #if [message] =~ "test" { drop {} }
 #if [message] !~ "string" { drop {} }
}

다음 필터식은 'test' 문자열 데이터 중 50%만 버리는 설정.
filter {
 #if "test" in [message] { drop { percentage => 50 } }
 #if [message] =~ "test" { drop { percentage => 50 } }
 #if [message] !~ "string" { drop { percentage => 50 } }
}

다음은 테스트에 사용한 test.log 파일.
string
test
test

6.8.2와 7.3, 두 가지 버전으로 테스트해봤는데 결과는 'drop {}' 구문을 사용할 때만 정상적으로 동작한다. percentage 옵션 값으로 100이 아닌 값을 주면 제대로 동작할 때도 있고, 아닐 때도 있고, 중구난방. 

다음은 'percentage => 50' 구문일 때 6.8.2 버전 테스트 결과.
[2019-08-18T22:02:15,322][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9601}
{
      "@version" => "1",
       "message" => "string\r",
    "@timestamp" => 2019-08-18T13:02:15.462Z,
          "path" => "d:/test.log",
          "host" => "MHKANG"
}
[2019-08-18T22:02:57,322][INFO ][logstash.pipelineaction.reload] Reloading pipeline {"pipeline.id"=>:main}
[2019-08-18T22:02:57,354][INFO ][filewatch.observingtail  ] QUIT - closing all files and shutting down.
[2019-08-18T22:02:58,332][INFO ][logstash.pipeline        ] Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x484ef1b7 run>"}
[2019-08-18T22:02:58,595][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-08-18T22:02:58,632][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
[2019-08-18T22:02:58,642][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x715fcb50 sleep>"}
[2019-08-18T22:02:58,666][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
{
      "@version" => "1",
       "message" => "test\r",
    "@timestamp" => 2019-08-18T13:02:58.694Z,
          "path" => "d:/test.log",
          "host" => "MHKANG"
}
{
      "@version" => "1",
       "message" => "string\r",
    "@timestamp" => 2019-08-18T13:02:58.694Z,
          "path" => "d:/test.log",
          "host" => "MHKANG"
}
[2019-08-18T22:03:09,064][INFO ][logstash.pipelineaction.reload] Reloading pipeline {"pipeline.id"=>:main}
[2019-08-18T22:03:09,064][INFO ][filewatch.observingtail  ] QUIT - closing all files and shutting down.
[2019-08-18T22:03:09,924][INFO ][logstash.pipeline        ] Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x715fcb50 run>"}
[2019-08-18T22:03:10,105][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-08-18T22:03:10,122][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x6318c661 sleep>"}
[2019-08-18T22:03:10,123][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-08-18T22:03:10,123][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
{
      "@version" => "1",
       "message" => "string\r",
    "@timestamp" => 2019-08-18T13:03:10.142Z,
          "path" => "d:/test.log",
          "host" => "MHKANG"
}

다음은 7.3 버전 테스트 결과.
[2019-08-18T22:02:49,254][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9602}
{
          "host" => "MHKANG",
       "message" => "string\r",
          "path" => "d:/test.log",
      "@version" => "1",
    "@timestamp" => 2019-08-18T13:02:49.502Z
}
[2019-08-18T22:02:57,986][INFO ][logstash.pipelineaction.reload] Reloading pipeline {"pipeline.id"=>:main}
[2019-08-18T22:02:58,032][INFO ][filewatch.observingtail  ] QUIT - closing all files and shutting down.
[2019-08-18T22:02:58,972][INFO ][logstash.javapipeline    ] Pipeline terminated {"pipeline.id"=>"main"}
[2019-08-18T22:02:59,155][INFO ][logstash.javapipeline    ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, :thread=>"#<Thread:0x6de2228c run>"}
[2019-08-18T22:02:59,185][INFO ][logstash.javapipeline    ] Pipeline started {"pipeline.id"=>"main"
[2019-08-18T22:02:59,207][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
[2019-08-18T22:02:59,236][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
{
          "host" => "MHKANG",
       "message" => "test\r",
          "path" => "d:/test.log",
      "@version" => "1",
    "@timestamp" => 2019-08-18T13:02:59.312Z
}
{
          "host" => "MHKANG",
       "message" => "test\r",
          "path" => "d:/test.log",
      "@version" => "1",
    "@timestamp" => 2019-08-18T13:02:59.312Z
}
{
          "host" => "MHKANG",
       "message" => "string\r",
          "path" => "d:/test.log",
      "@version" => "1",
    "@timestamp" => 2019-08-18T13:02:59.306Z
}
[2019-08-18T22:03:06,654][INFO ][logstash.pipelineaction.reload] Reloading pipeline {"pipeline.id"=>:main}
[2019-08-18T22:03:06,663][INFO ][filewatch.observingtail  ] QUIT - closing all files and shutting down.
[2019-08-18T22:03:07,511][INFO ][logstash.javapipeline    ] Pipeline terminated {"pipeline.id"=>"main"}
[2019-08-18T22:03:07,626][INFO ][logstash.javapipeline    ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, :thread=>"#<Thread:0x438c9958 run>"}
[2019-08-18T22:03:07,635][INFO ][logstash.javapipeline    ] Pipeline started {"pipeline.id"=>"main"
[2019-08-18T22:03:07,642][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
[2019-08-18T22:03:07,656][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
{
          "host" => "MHKANG",
       "message" => "string\r",
          "path" => "d:/test.log",
      "@version" => "1",
    "@timestamp" => 2019-08-18T13:03:07.665Z
}

데이터 두 개만 출력해야 하는데, 테스트할 때마다 결과가 달라짐(..) 일정 간격으로 수집된 데이터 중 특정 조건에 맞는 데이터만 개수를 따로 계산한 후, 정의된 비율만큼만 버려야 하는데 처리가 쉽지 않나보다. 버그인가? (잘 사용 중이신 분 제보 좀)

관련 글

댓글 없음:

댓글 쓰기

크리에이티브 커먼즈 라이선스