scripting
Scripting是ES支持的一种专门用于复杂场景下支持自定义编程的强大的脚本功能,ES支持多种脚本语言,如painless,其语法类似于Java,也有注释、关键字、类型、变量、函数等,其就要相对于其他脚本高出几倍的性能,并且安全可靠,可以用于内联和存储脚本。
语言
groovy: 即es 1.4.x - 5.0 的默认脚本语言.
painless: es5.0之后的默认脚本语言.
expression: 每个文档的开销较低,表达式的作用更多,可以非常快速地执行.但只能访问数值,布尔值,日期与geo_point字段.
mustache: 提供模板参数化查询.
特点
- 灵活度高,可编程能力强
- 相较于DSL性能低
- 不适用于复杂的业务场景
应用场景
- 自定义分词
- 自定义相关度
- 自定义评分
- 自定义过滤器
- 自定义聚合分析
- 自定义reindex
- 等等
正则开启
早先某些版本正则表达式默认情况下处于禁用模式,因为它绕过了painless的针对长时间运行和占用内存脚本的保护机制,而且有深度堆栈行为.
elasticsearch.yml中增加配置script.painless.regex.enabled: true
格式
脚本的格式:
"script": {
"lang": "...",
"source" | "id": "...",
"params": { ... }
}
lang: 语言.默认为painless.
source: 可以为 inline 脚本,或者是一个 id.这个id为stored脚本的id.
params: 脚本中所需的输入参数.
es8中可以通过GET _script_language 查看支持的语言与用法
GET _script_language
{
"types_allowed" : [
"inline",
"stored"
],
"language_contexts" : [
{
"language" : "expression",
"contexts" : [
"aggregation_selector",
"aggs",
"bucket_aggregation",
"field",
"filter",
"number_sort",
"score",
"terms_set"
]
},
{
"language" : "mustache",
"contexts" : [
"template"
]
},
{
"language" : "painless",
"contexts" : [
"aggregation_selector",
"aggs",
"aggs_combine",
"aggs_init",
"aggs_map",
"aggs_reduce",
"analysis",
"bucket_aggregation",
"field",
"filter",
"ingest",
"interval",
"moving-function",
"number_sort",
"painless_test",
"processor_conditional",
"score",
"script_heuristic",
"similarity",
"similarity_weight",
"string_sort",
"template",
"terms_set",
"update",
"watcher_condition",
"watcher_transform",
"xpack_template"
]
}
]
}
语法&应用场景
官方脚本例子索引
PUT /seats
{
"mappings": {
"properties": {
"theatre": { "type": "keyword" },
"play": { "type": "keyword" },
"actors": { "type": "keyword" },
"date": { "type": "keyword" },
"time": { "type": "keyword" },
"cost": { "type": "double" },
"row": { "type": "integer" },
"number": { "type": "integer" },
"sold": { "type": "boolean" },
"datetime": { "type": "date" }
}
}
}
POST seats/_bulk?pipeline=seats&refresh=true
{"create":{"_index":"seats","_id":"1"}}
{"theatre":"Skyline","play":"Rent","actors":["James Holland","Krissy Smith","Joe Muir","Ryan Earns"],"date":"2021-4-1","time":"3:00PM","cost":37,"row":1,"number":7,"sold":false}
{"create":{"_index":"seats","_id":"2"}}
{"theatre":"Graye","play":"Rent","actors":"Dave Christmas","date":"2021-4-1","time":"3:00PM","cost":30,"row":3,"number":5,"sold":false}
{"create":{"_index":"seats","_id":"3"}}
{"theatre":"Graye","play":"Rented","actors":"Dave Christmas","date":"2021-4-1","time":"3:00PM","cost":33,"row":2,"number":6,"sold":false}
{"create":{"_index":"seats","_id":"4"}}
{"theatre":"Skyline","play":"Rented","actors":["James Holland","Krissy Smith","Joe Muir","Ryan Earns"],"date":"2021-4-1","time":"3:00PM","cost":20,"row":5,"number":2,"sold":false}
{"create":{"_index":"seats","_id":"5"}}
{"theatre":"Down Port","play":"Pick It Up","actors":["Joel Madigan","Jessica Brown","Baz Knight","Jo Hangum","Rachel Grass","Phoebe Miller"],"date":"2018-4-2","time":"8:00PM","cost":27.5,"row":3,"number":2,"sold":false}
{"create":{"_index":"seats","_id":"6"}}
{"theatre":"Down Port","play":"Harriot","actors":["Phoebe Miller","Sarah Notch","Brayden Green","Joshua Iller","Jon Hittle","Rob Kettleman","Laura Conrad","Simon Hower","Nora Blue","Mike Candlestick","Jacey Bell"],"date":"2018-8-7","time":"8:00PM","cost":30,"row":1,"number":10,"sold":false}
{"create":{"_index":"seats","_id":"7"}}
{"theatre":"Skyline","play":"Auntie Jo","actors":["Jo Hangum","Jon Hittle","Rob Kettleman","Laura Conrad","Simon Hower","Nora Blue"],"date":"2018-10-2","time":"5:40PM","cost":22.5,"row":7,"number":10,"sold":false}
{"create":{"_index":"seats","_id":"8"}}
{"theatre":"Skyline","play":"Test Run","actors":["Joe Muir","Ryan Earns","Joel Madigan","Jessica Brown"],"date":"2018-8-5","time":"7:30PM","cost":17.5,"row":11,"number":12,"sold":true}
{"create":{"_index":"seats","_id":"9"}}
{"theatre":"Skyline","play":"Sunnyside Down","actors":["Krissy Smith","Joe Muir","Ryan Earns","Nora Blue","Mike Candlestick","Jacey Bell"],"date":"2018-6-12","time":"4:00PM","cost":21.25,"row":8,"number":15,"sold":true}
{"create":{"_index":"seats","_id":"10"}}
{"theatre":"Graye","play":"Line and Single","actors":["Nora Blue","Mike Candlestick"],"date":"2018-6-5","time":"2:00PM","cost":30,"row":1,"number":2,"sold":false}
{"create":{"_index":"seats","_id":"11"}}
{"theatre":"Graye","play":"Hamilton","actors":["Lin-Manuel Miranda","Leslie Odom Jr."],"date":"2018-6-5","time":"2:00PM","cost":5000,"row":1,"number":20,"sold":true}
关键词
| if | else | while | do | for |
| in | continue | break | return | new |
| try | catch | throw | this | instanceof |
运算符
算数运算符:
+ - * / %位运算符:
| & ^ ~ << >> >>>布尔运算符 (包含三元运算符):
&& || ! ?:比较运算符:
< <= == >= >常用数学函数:
abs ceil exp floor ln log10 logn max min sqrt pow三角函数库函数:
acosh acos asinh asin atanh atan atan2 cosh cos sinh sin tanh tan距离运算函数:
haversin其他函数:
min, max
管道脚本
变量
params: 用户自定义参数
ctx: 文档中字段.包含以map与list结构提取的json.
ctx[‘_index’]: 修改此项可更改当前文档的目标索引.
例子
提取日期格式字段与时间格式字段,转换成时间戳并赋值到datetime字段上.
-- 设置管道
PUT /_ingest/pipeline/seats
{
"description": "update datetime for seats",
"processors": [
{
"script": {
"source": """
String[] dateSplit = ctx.date.splitOnToken("-");
String year = dateSplit[0].trim();
String month = dateSplit[1].trim();
if (month.length() == 1) {
month = "0" + month;
}
String day = dateSplit[2].trim();
if (day.length() == 1) {
day = "0" + day;
}
boolean pm = ctx.time.substring(ctx.time.length() - 2).equals("PM");
String[] timeSplit = ctx.time.substring(0,
ctx.time.length() - 2).splitOnToken(":");
int hours = Integer.parseInt(timeSplit[0].trim());
int minutes = Integer.parseInt(timeSplit[1].trim());
if (pm) {
hours += 12;
}
String dts = year + "-" + month + "-" + day + "T" +
(hours < 10 ? "0" + hours : "" + hours) + ":" +
(minutes < 10 ? "0" + minutes : "" + minutes) +
":00+08:00";
ZonedDateTime dt = ZonedDateTime.parse(
dts, DateTimeFormatter.ISO_OFFSET_DATE_TIME);
ctx.datetime = dt.getLong(ChronoField.INSTANT_SECONDS)*1000L;
"""
}
}
]
}
--验证管道作用.返回的结果中datetime字段存在时间戳.
GET seats/_search
{
"size": 1
}
-- 返回 --------
....
"hits" : [
{
"_index" : "seats",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"play" : "Rent",
"date" : "2021-4-1",
"sold" : false,
"cost" : 37,
"theatre" : "Skyline",
"actors" : [
"James Holland",
"Krissy Smith",
"Joe Muir",
"Ryan Earns"
],
"number" : 7,
"datetime" : 1617260400000,
"time" : "3:00PM",
"row" : 1
}
}
]
....
运行时脚本
变量
params: 用户自定义参数
doc: 文档中字段.每个字段都作为一个值列表.
params[‘_source’]: 文档中字段.包含以map与list结构提取的json.
例子
在运行时输出是周几.
PUT seats/_mapping
{
"runtime": {
"day_of_week": {
"type": "keyword",
"script": {
"source": "emit(doc['datetime'].value.getDayOfWeekEnum().toString())"
}
}
}
}
-- 验证脚本作用.
-- 运行时输出字段非_source字段.所以不是出现在_source中.需要自行指定fields.并在结构的fileds中显示.
GET seats/_search
{
"size": 1,
"_source": false,
"fields": [
"*","day_of_week"
]
}
-- 返回-------
......
"hits" : [
{
"_index": "seats",
"_id": "1",
"_score": 1.0,
"fields": {
"play": [
"Rent"
],
"date": [
"2021-4-1"
],
"theatre": [
"Skyline"
],
"sold": [
false
],
"number": [
7
],
"actors": [
"James Holland",
"Krissy Smith",
"Joe Muir",
"Ryan Earns"
],
"datetime": [
"2021-04-01T07:00:00.000Z"
],
"cost": [
37.0
],
"row": [
1
],
"time": [
"3:00PM"
],
"day_of_week": [
"THURSDAY"
]
}
}
]
......
更新脚本
变量
params(只读): 用户自定义参数
ctx[‘op’]: 使用索引的默认值更新文档。 设置为 none 表示不进行任何操作,设置为 delete 表示从索引中删除当前文档。
ctx[‘_routing’] (只读): 分区名称.
ctx[‘_index’] (只读): 索引名称.
ctx[‘_id’] (只读): 文档的唯一id.
ctx[‘_version’] (只读): 当前文档的版本.
ctx[‘_now’] (只读): 当前时间戳.只在_update中存在,_update_by_query中不存在.
ctx[‘_source’]: 文档中字段.包含以map与list结构提取的json.可修改.
例子
修改id为3的座位已被卖出,卖出价格为26.
POST /seats/_update/3
{
"script": {
"source": "ctx['_source'].sold = true; ctx._source.cost = params.sold_cost",
"lang": "painless",
"params": {
"sold_cost": 26
}
}
}
-- 查看id为3的记录结果
GET /seats/_doc/3
--返回-----
{
"_index" : "seats",
"_id" : "3",
"_version" : 3,
"_seq_no" : 12,
"_primary_term" : 1,
"found" : true,
"_source" : {
"play" : "Rented",
"date" : "2021-4-1",
"sold" : true,
"cost" : 26,
"theatre" : "Graye",
"actors" : "Dave Christmas",
"number" : 6,
"datetime" : 1617260400000,
"time" : "3:00PM",
"row" : 2
}
}
批量更新:前三排还没卖出去的,费用减少2元.
POST /seats/_update_by_query
{
"query": {
"bool": {
"filter": [
{
"range": {
"row": {
"lte": 3
}
}
},
{
"match": {
"sold": false
}
}
]
}
},
"script": {
"source": "ctx._source.cost -= params.discount",
"lang": "painless",
"params": {
"discount": 2
}
}
}
-- 执行前后查看 前三排没卖出去的座位价格
GET seats/_search
{
"size": 1,
"query": {
"bool": {
"must": [
{
"term": {
"sold": {
"value": false
}
}
},
{
"range": {
"row": {
"lte": 3
}
}
}
]
}
}
}
-- 执行前
{
"_index" : "seats",
"_id" : "1",
"_score" : 1.3448405,
"_source" : {
"play" : "Rent",
"date" : "2021-4-1",
"sold" : false,
"cost" : 37,
"theatre" : "Skyline",
"actors" : [
"James Holland",
"Krissy Smith",
"Joe Muir",
"Ryan Earns"
],
"number" : 7,
"datetime" : 1617260400000,
"time" : "3:00PM",
"row" : 1
}
}
-- 执行后
{
"_index" : "seats",
"_id" : "1",
"_score" : 1.3829923,
"_source" : {
"play" : "Rent",
"date" : "2021-4-1",
"theatre" : "Skyline",
"sold" : false,
"actors" : [
"James Holland",
"Krissy Smith",
"Joe Muir",
"Ryan Earns"
],
"number" : 7,
"datetime" : 1617260400000,
"cost" : 35,
"time" : "3:00PM",
"row" : 1
}
}
重新索引脚本
变量
params(只读): 用户自定义参数
ctx[‘op’]: 使用索引的默认值更新文档。 设置为 none 表示不进行任何操作,设置为 delete 表示从索引中删除当前文档。
ctx[‘_routing’] : 更改当前文档的路由.
ctx[‘_index’] : 更改当前文档的索引.
ctx[‘_id’] : 修改文档的唯一id.
ctx[‘_version’] : 修改当前文档的版本.
ctx[‘_source’]: 文档中字段.包含以map与list结构提取的json.可修改.
返回
无.一般用于在重建索引时批量修改一些字段值.
例子
重建索引.将前三排未售出的票价打七折.六排之后的票价涨20%.
POST _reindex
{
"source": {
"index": "seats"
},
"dest": {
"index": "seats2"
},
"script": {
"source": """
if (ctx._source.row<3 && !ctx._source.sold){
ctx._source.cost = ctx._source.cost ;
}
if (ctx._source.row>6 && !ctx._source.sold){
ctx._source.cost= ctx._source.cost * params.discount2;
}
""", "params": {"discount":0.7,"discount2":1.2}
}
}
排序/评分脚本
变量
params: 用户自定义参数
doc: 文档中字段.
_score: 当前文档的相似度得分。
返回
排序得分,返回类型取决于脚本排序配置中的类型参数值(”数字 “或 “字符串”).
例子
根据剧名的长度乘以一定系数所得的分数来升序排序
GET /_search
{
"query": {
"term": {
"sold": "true"
}
},
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "doc['theatre'].value.length() * params.factor",
"params": {
"factor": 1.1
}
},
"order": "asc"
}
}
}
-- 返回结果 -------
"hits" : [
{
"_index" : "seats",
"_id" : "11",
"_score" : null,
"_source" : {
"play" : "Hamilton",
"date" : "2018-6-5",
"sold" : true,
"cost" : 5000,
"theatre" : "Graye",
"actors" : [
"Lin-Manuel Miranda",
"Leslie Odom Jr."
],
"number" : 20,
"datetime" : 1528178400000,
"time" : "2:00PM",
"row" : 1
},
"sort" : [
5.5
]
},
{
"_index" : "seats",
"_id" : "8",
"_score" : null,
"_source" : {
"play" : "Test Run",
"date" : "2018-8-5",
"sold" : true,
"cost" : 17.5,
"theatre" : "Skyline",
"actors" : [
"Joe Muir",
"Ryan Earns",
"Joel Madigan",
"Jessica Brown"
],
"number" : 12,
"datetime" : 1533468600000,
"time" : "7:30PM",
"row" : 11
},
"sort" : [
7.700000000000001
]
},
{
"_index" : "seats",
"_id" : "9",
"_score" : null,
"_source" : {
"play" : "Sunnyside Down",
"date" : "2018-6-12",
"sold" : true,
"cost" : 21.25,
"theatre" : "Skyline",
"actors" : [
"Krissy Smith",
"Joe Muir",
"Ryan Earns",
"Nora Blue",
"Mike Candlestick",
"Jacey Bell"
],
"number" : 15,
"datetime" : 1528790400000,
"time" : "4:00PM",
"row" : 8
},
"sort" : [
7.700000000000001
]
}
]
查询字段脚本
变量
params: 用户自定义参数
doc: 文档中字段.
params[‘_source’]: 文档中的字段.包含以map与list结构提取的json.
返回
文档自定义的值.输出在fields中.若查询中不添加_source的include字段,则表示_source默认缺省值为false.返回信息里不会带上_source内容.
例子
获取计算出的星期和每个剧目的演员人数.
GET seats/_search
{
"size": 2,
"query": {
"match_all": {}
},
"script_fields": {
"day-of-week": {
"script": {
"source": "doc['datetime'].value.getDayOfWeekEnum().getDisplayName(TextStyle.FULL, Locale.ROOT)"
}
},
"number-of-actors": {
"script": {
"source": "doc['actors'].size()"
}
}
}
}
--返回-----
......
"hits" : [
{
"_index" : "seats",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"day-of-week" : [
"Thursday"
],
"number-of-actors" : [
4
]
}
},
{
"_index" : "seats",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"day-of-week" : [
"Thursday"
],
"number-of-actors" : [
1
]
}
}
]
......
过滤器脚本
变量
params: 用户自定义参数
doc: 文档中字段.
返回
返回boolean类型.true为输出显示.false为过滤.只作为过滤依据,不会再结果中体现.
例子
查询25元以下未售出的位子.
GET seats/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": "doc['sold'].value == false && doc['cost'].value < params.cost",
"params": {
"cost": 25
}
}
}
}
}
}
}
最少匹配数脚本
变量
params: 用户自定义参数
params[‘num_terms’]: 记录中匹配到的数量.
doc: 文档中字段.
返回
两个数组中匹配中的记录数.
例子
匹配剧本中有smith,earns,black出演的,且至少同时有两人出演的位子记录.
GET seats/_search
{
"query": {
"terms_set": {
"actors": {
"terms": [
"smith",
"earns",
"black"
],
"minimum_should_match_script": {
"source": "Math.min(params['num_terms'], params['min_actors_to_see'])",
"params": {
"min_actors_to_see": 2
}
}
}
}
}
}
source与doc
Doc: doc[‘field_name’]
值是一个列式(columnar)字段值存储,除了analyzed text字段,默认在全部字段开启。
只返回简单的字段值,如数字、日期、地理坐标、terms等等,或者这些值的列表。不能返回 json 对象。
在
painless脚本语法中,在访问docmap 前,会首先检查doc.containsKey('field'),但在expression脚本中,没法检查字段在映射中的存在。对于text类型字段,设置了 fielddata的属性后,也可以用 doc[‘field’] 语法取值,但设置了 fielddata 的text 字段需要加载所有的 terms 到 JVM 堆中,这回非常消耗内存和 CPU。
source: _source[‘field_name’] 或 _source.field_name
_source会加载为一个映射.将源文档关联上.可以修改文件.访问 _source 字段比 doc-values 方式更慢。
应用:
- ingest场景使用ctx.xxx;
- update/update_by_query/reindex这些修改文档的场景使用ctx._source;
- search与聚合等查询的场景尽量使用doc.
- _source 字段对每个结果返回多个字段进行了优化,而 doc values 对访问许多文档的指定字段进行了优化.
内置脚本方法
创建脚本
创建一个打折的脚本.并将结果输出到2位小数.
POST _scripts/discount_script
{
"script": {
"lang": "painless",
"source": "(doc['cost'].value * params['discount'] * 100)/100"
}
}
查看脚本
GET _scripts/discount_script
-- 返回 ----
{
"_id" : "discount_script",
"found" : true,
"script" : {
"lang" : "painless",
"source" : "(doc['cost'].value * params['discount'] * 100)/100"
}
}
使用脚本
GET seats/_search
{
"script_fields": {
"discount_cost": {
"script":{
"id": "discount_script",
"params": {"discount":0.8}
}
}
}
}
--返回-------
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "seats",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"discount_cost" : [
28.0
]
}
},
{
"_index" : "seats",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"discount_cost" : [
22.4
]
}
},
{
"_index" : "seats",
"_id" : "5",
"_score" : 1.0,
"fields" : {
"discount_cost" : [
20.400000000000002
]
}
},
{
"_index" : "seats",
"_id" : "6",
"_score" : 1.0,
"fields" : {
"discount_cost" : [
22.4
]
}
},
{
"_index" : "seats",
"_id" : "10",
"_score" : 1.0,
"fields" : {
"discount_cost" : [
22.4
]
}
},
{
"_index" : "seats",
"_id" : "4",
"_score" : 1.0,
"fields" : {
"discount_cost" : [
16.0
]
}
},
{
"_index" : "seats",
"_id" : "7",
"_score" : 1.0,
"fields" : {
"discount_cost" : [
18.0
]
}
},
{
"_index" : "seats",
"_id" : "8",
"_score" : 1.0,
"fields" : {
"discount_cost" : [
14.0
]
}
},
{
"_index" : "seats",
"_id" : "9",
"_score" : 1.0,
"fields" : {
"discount_cost" : [
17.0
]
}
},
{
"_index" : "seats",
"_id" : "11",
"_score" : 1.0,
"fields" : {
"discount_cost" : [
4000.0
]
}
}
]
}
}