ES 十二月 21, 2023

脚本查询 scripting

文章字数 34k 阅读约需 31 mins. 阅读次数 0

scripting

​ Scripting是ES支持的一种专门用于复杂场景下支持自定义编程的强大的脚本功能,ES支持多种脚本语言,如painless,其语法类似于Java,也有注释、关键字、类型、变量、函数等,其就要相对于其他脚本高出几倍的性能,并且安全可靠,可以用于内联和存储脚本。

语言

groovy: 即es 1.4.x - 5.0 的默认脚本语言.

painless: es5.0之后的默认脚本语言.

expression: 每个文档的开销较低,表达式的作用更多,可以非常快速地执行.但只能访问数值,布尔值,日期与geo_point字段.

mustache: 提供模板参数化查询.

特点

  • 灵活度高,可编程能力强
  • 相较于DSL性能低
  • 不适用于复杂的业务场景

应用场景

  • 自定义分词
  • 自定义相关度
  • 自定义评分
  • 自定义过滤器
  • 自定义聚合分析
  • 自定义reindex
  • 等等

正则开启

​ 早先某些版本正则表达式默认情况下处于禁用模式,因为它绕过了painless的针对长时间运行和占用内存脚本的保护机制,而且有深度堆栈行为.

elasticsearch.yml中增加配置script.painless.regex.enabled: true

格式

脚本的格式:

"script": {
    "lang":   "...",  
    "source" | "id": "...",  
    "params": { ... }  
  }

lang: 语言.默认为painless.

source: 可以为 inline 脚本,或者是一个 id.这个id为stored脚本的id.

params: 脚本中所需的输入参数.


es8中可以通过GET _script_language 查看支持的语言与用法

GET _script_language

{
  "types_allowed" : [
    "inline",
    "stored"
  ],
  "language_contexts" : [
    {
      "language" : "expression",
      "contexts" : [
        "aggregation_selector",
        "aggs",
        "bucket_aggregation",
        "field",
        "filter",
        "number_sort",
        "score",
        "terms_set"
      ]
    },
    {
      "language" : "mustache",
      "contexts" : [
        "template"
      ]
    },
    {
      "language" : "painless",
      "contexts" : [
        "aggregation_selector",
        "aggs",
        "aggs_combine",
        "aggs_init",
        "aggs_map",
        "aggs_reduce",
        "analysis",
        "bucket_aggregation",
        "field",
        "filter",
        "ingest",
        "interval",
        "moving-function",
        "number_sort",
        "painless_test",
        "processor_conditional",
        "score",
        "script_heuristic",
        "similarity",
        "similarity_weight",
        "string_sort",
        "template",
        "terms_set",
        "update",
        "watcher_condition",
        "watcher_transform",
        "xpack_template"
      ]
    }
  ]
}

语法&应用场景

es脚本语法官方文档

官方脚本例子索引


PUT /seats
{
  "mappings": {
    "properties": {
      "theatre":  { "type": "keyword" },
      "play":     { "type": "keyword" },
      "actors":   { "type": "keyword" },
      "date":     { "type": "keyword" },
      "time":     { "type": "keyword" },
      "cost":     { "type": "double"  },
      "row":      { "type": "integer" },
      "number":   { "type": "integer" },
      "sold":     { "type": "boolean" },
      "datetime": { "type": "date"    }
    }
  }
}


POST seats/_bulk?pipeline=seats&refresh=true
{"create":{"_index":"seats","_id":"1"}}
{"theatre":"Skyline","play":"Rent","actors":["James Holland","Krissy Smith","Joe Muir","Ryan Earns"],"date":"2021-4-1","time":"3:00PM","cost":37,"row":1,"number":7,"sold":false}
{"create":{"_index":"seats","_id":"2"}}
{"theatre":"Graye","play":"Rent","actors":"Dave Christmas","date":"2021-4-1","time":"3:00PM","cost":30,"row":3,"number":5,"sold":false}
{"create":{"_index":"seats","_id":"3"}}
{"theatre":"Graye","play":"Rented","actors":"Dave Christmas","date":"2021-4-1","time":"3:00PM","cost":33,"row":2,"number":6,"sold":false}
{"create":{"_index":"seats","_id":"4"}}
{"theatre":"Skyline","play":"Rented","actors":["James Holland","Krissy Smith","Joe Muir","Ryan Earns"],"date":"2021-4-1","time":"3:00PM","cost":20,"row":5,"number":2,"sold":false}
{"create":{"_index":"seats","_id":"5"}}
{"theatre":"Down Port","play":"Pick It Up","actors":["Joel Madigan","Jessica Brown","Baz Knight","Jo Hangum","Rachel Grass","Phoebe Miller"],"date":"2018-4-2","time":"8:00PM","cost":27.5,"row":3,"number":2,"sold":false}
{"create":{"_index":"seats","_id":"6"}}
{"theatre":"Down Port","play":"Harriot","actors":["Phoebe Miller","Sarah Notch","Brayden Green","Joshua Iller","Jon Hittle","Rob Kettleman","Laura Conrad","Simon Hower","Nora Blue","Mike Candlestick","Jacey Bell"],"date":"2018-8-7","time":"8:00PM","cost":30,"row":1,"number":10,"sold":false}
{"create":{"_index":"seats","_id":"7"}}
{"theatre":"Skyline","play":"Auntie Jo","actors":["Jo Hangum","Jon Hittle","Rob Kettleman","Laura Conrad","Simon Hower","Nora Blue"],"date":"2018-10-2","time":"5:40PM","cost":22.5,"row":7,"number":10,"sold":false}
{"create":{"_index":"seats","_id":"8"}}
{"theatre":"Skyline","play":"Test Run","actors":["Joe Muir","Ryan Earns","Joel Madigan","Jessica Brown"],"date":"2018-8-5","time":"7:30PM","cost":17.5,"row":11,"number":12,"sold":true}
{"create":{"_index":"seats","_id":"9"}}
{"theatre":"Skyline","play":"Sunnyside Down","actors":["Krissy Smith","Joe Muir","Ryan Earns","Nora Blue","Mike Candlestick","Jacey Bell"],"date":"2018-6-12","time":"4:00PM","cost":21.25,"row":8,"number":15,"sold":true}
{"create":{"_index":"seats","_id":"10"}}
{"theatre":"Graye","play":"Line and Single","actors":["Nora Blue","Mike Candlestick"],"date":"2018-6-5","time":"2:00PM","cost":30,"row":1,"number":2,"sold":false}
{"create":{"_index":"seats","_id":"11"}}
{"theatre":"Graye","play":"Hamilton","actors":["Lin-Manuel Miranda","Leslie Odom Jr."],"date":"2018-6-5","time":"2:00PM","cost":5000,"row":1,"number":20,"sold":true}

关键词

if else while do for
in continue break return new
try catch throw this instanceof

运算符

  • 算数运算符: + - * / %

  • 位运算符: | & ^ ~ << >> >>>

  • 布尔运算符 (包含三元运算符): && || ! ?:

  • 比较运算符: < <= == >= >

  • 常用数学函数: abs ceil exp floor ln log10 logn max min sqrt pow

  • 三角函数库函数: acosh acos asinh asin atanh atan atan2 cosh cos sinh sin tanh tan

  • 距离运算函数: haversin

  • 其他函数: min, max

管道脚本

变量

params: 用户自定义参数

ctx: 文档中字段.包含以map与list结构提取的json.

ctx[‘_index’]: 修改此项可更改当前文档的目标索引.

例子

提取日期格式字段与时间格式字段,转换成时间戳并赋值到datetime字段上.

-- 设置管道
PUT /_ingest/pipeline/seats
{
  "description": "update datetime for seats",
  "processors": [
    {
      "script": {
        "source": """
        String[] dateSplit = ctx.date.splitOnToken("-");                     
        String year = dateSplit[0].trim();
        String month = dateSplit[1].trim();

        if (month.length() == 1) {                                           
            month = "0" + month;
        }

        String day = dateSplit[2].trim();

        if (day.length() == 1) {                                             
            day = "0" + day;
        }

        boolean pm = ctx.time.substring(ctx.time.length() - 2).equals("PM"); 
        String[] timeSplit = ctx.time.substring(0,
                ctx.time.length() - 2).splitOnToken(":");                    
        int hours = Integer.parseInt(timeSplit[0].trim());
        int minutes = Integer.parseInt(timeSplit[1].trim());

        if (pm) {                                                            
            hours += 12;
        }

        String dts = year + "-" + month + "-" + day + "T" +
                (hours < 10 ? "0" + hours : "" + hours) + ":" +
                (minutes < 10 ? "0" + minutes : "" + minutes) +
                ":00+08:00";                                                 

        ZonedDateTime dt = ZonedDateTime.parse(
                 dts, DateTimeFormatter.ISO_OFFSET_DATE_TIME);               
        ctx.datetime = dt.getLong(ChronoField.INSTANT_SECONDS)*1000L;       
        """
      }
    }
  ]
}

--验证管道作用.返回的结果中datetime字段存在时间戳.

GET seats/_search
{
  "size": 1
}
-- 返回 --------
....
"hits" : [
      {
        "_index" : "seats",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "play" : "Rent",
          "date" : "2021-4-1",
          "sold" : false,
          "cost" : 37,
          "theatre" : "Skyline",
          "actors" : [
            "James Holland",
            "Krissy Smith",
            "Joe Muir",
            "Ryan Earns"
          ],
          "number" : 7,
          "datetime" : 1617260400000,
          "time" : "3:00PM",
          "row" : 1
        }
      }
    ]
....

运行时脚本

变量

params: 用户自定义参数

doc: 文档中字段.每个字段都作为一个值列表.

params[‘_source’]: 文档中字段.包含以map与list结构提取的json.

例子

在运行时输出是周几.

PUT seats/_mapping
{
  "runtime": {
    "day_of_week": {
      "type": "keyword",
      "script": {
        "source": "emit(doc['datetime'].value.getDayOfWeekEnum().toString())"
      }
    }
  }
}

-- 验证脚本作用.
-- 运行时输出字段非_source字段.所以不是出现在_source中.需要自行指定fields.并在结构的fileds中显示.
GET seats/_search
{
  "size": 1,
  "_source": false, 
  "fields": [
    "*","day_of_week"
  ]
}
-- 返回-------

......
    "hits" : [
    {
        "_index": "seats",
        "_id": "1",
        "_score": 1.0,
        "fields": {
            "play": [
                "Rent"
            ],
            "date": [
                "2021-4-1"
            ],
            "theatre": [
                "Skyline"
            ],
            "sold": [
                false
            ],
            "number": [
                7
            ],
            "actors": [
                "James Holland",
                "Krissy Smith",
                "Joe Muir",
                "Ryan Earns"
            ],
            "datetime": [
                "2021-04-01T07:00:00.000Z"
            ],
            "cost": [
                37.0
            ],
            "row": [
                1
            ],
            "time": [
                "3:00PM"
            ],
            "day_of_week": [
                "THURSDAY"
            ]
        }
    }
]
          ......

更新脚本

变量

params(只读): 用户自定义参数

ctx[‘op’]: 使用索引的默认值更新文档。 设置为 none 表示不进行任何操作,设置为 delete 表示从索引中删除当前文档。

ctx[‘_routing’] (只读): 分区名称.

ctx[‘_index’] (只读): 索引名称.

ctx[‘_id’] (只读): 文档的唯一id.

ctx[‘_version’] (只读): 当前文档的版本.

ctx[‘_now’] (只读): 当前时间戳.只在_update中存在,_update_by_query中不存在.

ctx[‘_source’]: 文档中字段.包含以map与list结构提取的json.可修改.

例子

修改id为3的座位已被卖出,卖出价格为26.

POST /seats/_update/3
{
  "script": {
    "source": "ctx['_source'].sold = true; ctx._source.cost = params.sold_cost",
    "lang": "painless",
    "params": {
      "sold_cost": 26
    }
  }
}

-- 查看id为3的记录结果
GET /seats/_doc/3
--返回-----
{
  "_index" : "seats",
  "_id" : "3",
  "_version" : 3,
  "_seq_no" : 12,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "play" : "Rented",
    "date" : "2021-4-1",
    "sold" : true,
    "cost" : 26,
    "theatre" : "Graye",
    "actors" : "Dave Christmas",
    "number" : 6,
    "datetime" : 1617260400000,
    "time" : "3:00PM",
    "row" : 2
  }
}

批量更新:前三排还没卖出去的,费用减少2元.

POST /seats/_update_by_query
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "row": {
              "lte": 3
            }
          }
        },
        {
          "match": {
            "sold": false
          }
        }
      ]
    }
  },
  "script": {
    "source": "ctx._source.cost -= params.discount",
    "lang": "painless",
    "params": {
      "discount": 2
    }
  }
}

-- 执行前后查看 前三排没卖出去的座位价格
GET seats/_search
{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "sold": {
              "value": false
            }
          }
        },
        {
          "range": {
            "row": {
              "lte": 3
            }
          }
        }
      ]
    }
  }
}

-- 执行前
{
    "_index" : "seats",
    "_id" : "1",
    "_score" : 1.3448405,
    "_source" : {
      "play" : "Rent",
      "date" : "2021-4-1",
      "sold" : false,
      "cost" : 37,
      "theatre" : "Skyline",
      "actors" : [
        "James Holland",
        "Krissy Smith",
        "Joe Muir",
        "Ryan Earns"
      ],
      "number" : 7,
      "datetime" : 1617260400000,
      "time" : "3:00PM",
      "row" : 1
    }
}
-- 执行后
{
    "_index" : "seats",
    "_id" : "1",
    "_score" : 1.3829923,
    "_source" : {
      "play" : "Rent",
      "date" : "2021-4-1",
      "theatre" : "Skyline",
      "sold" : false,
      "actors" : [
        "James Holland",
        "Krissy Smith",
        "Joe Muir",
        "Ryan Earns"
      ],
      "number" : 7,
      "datetime" : 1617260400000,
      "cost" : 35,
      "time" : "3:00PM",
      "row" : 1
    }
  }

重新索引脚本

变量

params(只读): 用户自定义参数

ctx[‘op’]: 使用索引的默认值更新文档。 设置为 none 表示不进行任何操作,设置为 delete 表示从索引中删除当前文档。

ctx[‘_routing’] : 更改当前文档的路由.

ctx[‘_index’] : 更改当前文档的索引.

ctx[‘_id’] : 修改文档的唯一id.

ctx[‘_version’] : 修改当前文档的版本.

ctx[‘_source’]: 文档中字段.包含以map与list结构提取的json.可修改.

返回

无.一般用于在重建索引时批量修改一些字段值.

例子

重建索引.将前三排未售出的票价打七折.六排之后的票价涨20%.


POST _reindex
{
  "source": {
    "index": "seats"
  },
  "dest": {
    "index": "seats2"
  }, 
  "script": {
    "source": """
    if (ctx._source.row<3 && !ctx._source.sold){
      ctx._source.cost = ctx._source.cost ;
    }
    if (ctx._source.row>6 && !ctx._source.sold){
      ctx._source.cost= ctx._source.cost * params.discount2;
    }
    """, "params": {"discount":0.7,"discount2":1.2}
  }
}

排序/评分脚本

变量

params: 用户自定义参数

doc: 文档中字段.

_score: 当前文档的相似度得分。

返回

排序得分,返回类型取决于脚本排序配置中的类型参数值(”数字 “或 “字符串”).

例子

根据剧名的长度乘以一定系数所得的分数来升序排序

GET /_search
{
  "query": {
    "term": {
      "sold": "true"
    }
  },
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "source": "doc['theatre'].value.length() * params.factor",
        "params": {
          "factor": 1.1
        }
      },
      "order": "asc"
    }
  }
}




-- 返回结果  -------
    "hits" : [
      {
        "_index" : "seats",
        "_id" : "11",
        "_score" : null,
        "_source" : {
          "play" : "Hamilton",
          "date" : "2018-6-5",
          "sold" : true,
          "cost" : 5000,
          "theatre" : "Graye",
          "actors" : [
            "Lin-Manuel Miranda",
            "Leslie Odom Jr."
          ],
          "number" : 20,
          "datetime" : 1528178400000,
          "time" : "2:00PM",
          "row" : 1
        },
        "sort" : [
          5.5
        ]
      },
      {
        "_index" : "seats",
        "_id" : "8",
        "_score" : null,
        "_source" : {
          "play" : "Test Run",
          "date" : "2018-8-5",
          "sold" : true,
          "cost" : 17.5,
          "theatre" : "Skyline",
          "actors" : [
            "Joe Muir",
            "Ryan Earns",
            "Joel Madigan",
            "Jessica Brown"
          ],
          "number" : 12,
          "datetime" : 1533468600000,
          "time" : "7:30PM",
          "row" : 11
        },
        "sort" : [
          7.700000000000001
        ]
      },
      {
        "_index" : "seats",
        "_id" : "9",
        "_score" : null,
        "_source" : {
          "play" : "Sunnyside Down",
          "date" : "2018-6-12",
          "sold" : true,
          "cost" : 21.25,
          "theatre" : "Skyline",
          "actors" : [
            "Krissy Smith",
            "Joe Muir",
            "Ryan Earns",
            "Nora Blue",
            "Mike Candlestick",
            "Jacey Bell"
          ],
          "number" : 15,
          "datetime" : 1528790400000,
          "time" : "4:00PM",
          "row" : 8
        },
        "sort" : [
          7.700000000000001
        ]
      }
    ]
  

查询字段脚本

变量

params: 用户自定义参数

doc: 文档中字段.

params[‘_source’]: 文档中的字段.包含以map与list结构提取的json.

返回

文档自定义的值.输出在fields中.若查询中不添加_source的include字段,则表示_source默认缺省值为false.返回信息里不会带上_source内容.

例子

获取计算出的星期和每个剧目的演员人数.

GET seats/_search
{
  "size": 2,
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "day-of-week": {
      "script": {
        "source": "doc['datetime'].value.getDayOfWeekEnum().getDisplayName(TextStyle.FULL, Locale.ROOT)"
      }
    },
    "number-of-actors": {
      "script": {
        "source": "doc['actors'].size()"
      }
    }
  }
}

--返回-----
......
"hits" : [
      {
        "_index" : "seats",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          "day-of-week" : [
            "Thursday"
          ],
          "number-of-actors" : [
            4
          ]
        }
      },
      {
        "_index" : "seats",
        "_id" : "2",
        "_score" : 1.0,
        "fields" : {
          "day-of-week" : [
            "Thursday"
          ],
          "number-of-actors" : [
            1
          ]
        }
      }
    ]
......

过滤器脚本

变量

params: 用户自定义参数

doc: 文档中字段.

返回

返回boolean类型.true为输出显示.false为过滤.只作为过滤依据,不会再结果中体现.

例子

查询25元以下未售出的位子.

GET seats/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "source": "doc['sold'].value == false && doc['cost'].value < params.cost",
            "params": {
              "cost": 25
            }
          }
        }
      }
    }
  }
}

最少匹配数脚本

变量

params: 用户自定义参数

params[‘num_terms’]: 记录中匹配到的数量.

doc: 文档中字段.

返回

两个数组中匹配中的记录数.

例子

匹配剧本中有smith,earns,black出演的,且至少同时有两人出演的位子记录.

GET seats/_search
{
  "query": {
    "terms_set": {
      "actors": {
        "terms": [
          "smith",
          "earns",
          "black"
        ],
        "minimum_should_match_script": {
          "source": "Math.min(params['num_terms'], params['min_actors_to_see'])",
          "params": {
            "min_actors_to_see": 2
          }
        }
      }
    }
  }
}

source与doc

Doc: doc[‘field_name’]

  • 值是一个列式(columnar)字段值存储,除了analyzed text字段,默认在全部字段开启。

  • 只返回简单的字段值,如数字、日期、地理坐标、terms等等,或者这些值的列表。不能返回 json 对象。

  • painless 脚本语法中,在访问 doc map 前,会首先检查 doc.containsKey('field'),但在 expression 脚本中,没法检查字段在映射中的存在。

  • 对于text类型字段,设置了 fielddata的属性后,也可以用 doc[‘field’] 语法取值,但设置了 fielddata 的text 字段需要加载所有的 terms 到 JVM 堆中,这回非常消耗内存和 CPU。

source: _source[‘field_name’] 或 _source.field_name

  • _source 会加载为一个映射.将源文档关联上.可以修改文件.

  • 访问 _source 字段比 doc-values 方式更慢。

应用:

  • ingest场景使用ctx.xxx;
  • update/update_by_query/reindex这些修改文档的场景使用ctx._source;
  • search与聚合等查询的场景尽量使用doc.
  • _source 字段对每个结果返回多个字段进行了优化,而 doc values 对访问许多文档的指定字段进行了优化.

内置脚本方法

创建脚本

创建一个打折的脚本.并将结果输出到2位小数.

POST _scripts/discount_script
{
  "script": {
        "lang": "painless",
        "source": "(doc['cost'].value * params['discount'] * 100)/100"
      }
}

查看脚本

GET _scripts/discount_script
-- 返回 ----
{
  "_id" : "discount_script",
  "found" : true,
  "script" : {
    "lang" : "painless",
    "source" : "(doc['cost'].value * params['discount'] * 100)/100"
  }
}

使用脚本

GET seats/_search
{
  "script_fields": {
    "discount_cost": {
      "script":{
        "id": "discount_script",
        "params": {"discount":0.8}
      }
    }
  }
}

--返回-------
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "seats",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          "discount_cost" : [
            28.0
          ]
        }
      },
      {
        "_index" : "seats",
        "_id" : "2",
        "_score" : 1.0,
        "fields" : {
          "discount_cost" : [
            22.4
          ]
        }
      },
      {
        "_index" : "seats",
        "_id" : "5",
        "_score" : 1.0,
        "fields" : {
          "discount_cost" : [
            20.400000000000002
          ]
        }
      },
      {
        "_index" : "seats",
        "_id" : "6",
        "_score" : 1.0,
        "fields" : {
          "discount_cost" : [
            22.4
          ]
        }
      },
      {
        "_index" : "seats",
        "_id" : "10",
        "_score" : 1.0,
        "fields" : {
          "discount_cost" : [
            22.4
          ]
        }
      },
      {
        "_index" : "seats",
        "_id" : "4",
        "_score" : 1.0,
        "fields" : {
          "discount_cost" : [
            16.0
          ]
        }
      },
      {
        "_index" : "seats",
        "_id" : "7",
        "_score" : 1.0,
        "fields" : {
          "discount_cost" : [
            18.0
          ]
        }
      },
      {
        "_index" : "seats",
        "_id" : "8",
        "_score" : 1.0,
        "fields" : {
          "discount_cost" : [
            14.0
          ]
        }
      },
      {
        "_index" : "seats",
        "_id" : "9",
        "_score" : 1.0,
        "fields" : {
          "discount_cost" : [
            17.0
          ]
        }
      },
      {
        "_index" : "seats",
        "_id" : "11",
        "_score" : 1.0,
        "fields" : {
          "discount_cost" : [
            4000.0
          ]
        }
      }
    ]
  }
}
0%