docker es搜索引擎方案实现测试

1、全文检索功能,(中文分词、拼音、英文大小写、标题内容权重)

拉取镜像

docker pull elasticsearch:6.6.0

使用镜像运行容器

docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:6.6.0

进入容器

docker exec -it elasticsearch /bin/bash

安装分词

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip

安装拼音

elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v6.6.0/elasticsearch-analysis-pinyin-6.6.0.zip

安装kabana

docker pull kibana:6.6.0
docker run -d --name kabana -p 5601:5601 kibana:6.6.0
docker exec -it kabana /bin/bash
vi config/kibana.yml

elasticsearch->10.100.123.13
exit
docker restart kabana

打开域名配置

http://xx.com

对应192.168.0.1:5601

本地配置Hosts文件添加一行:

192.168.0.1 xx.com

打开dev tools

http://xx.com/app/kibana#/dev_tools/console?_g=()

创建索引

PUT test_index
{
 "settings":{
 "number_of_shards":"1",
 "index.refresh_interval":"15s",
 "index":{
 "analysis":{
 "analyzer":{
 "ik_pinyin_analyzer":{
 "type":"custom",
 "tokenizer":"ik_smart",
 "filter":"pinyin_filter"
 }
 },
 "filter":{
 "pinyin_filter":{
 "type":"pinyin",
 "keep_first_letter": false
 }
 }
 }
 }
 }
}

参数:index.refresh_interval 索引刷新间隔时长,默认1s

测试

POST test_index/_analyze
{
 "analyzer": "ik_pinyin_analyzer",
 "text":"讯飞文档"
}

结果:

{
"tokens" : [
{
"token" : "xun",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "fei",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "wen",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "dang",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
}
]
}

设置content使用创建的分词组件

PUT test_index/_mapping/test_type
{
 "properties": {
 "content":{
 "type": "text",
 "analyzer": "ik_smart",
 "search_analyzer": "ik_smart",
 "fields": {
 "my_pinyin":{
 "type":"text",
 "analyzer": "ik_pinyin_analyzer",
 "search_analyzer": "ik_pinyin_analyzer"
 }
 }
 }
 }
}

添加测试数据

POST test_index/test_type
{
 "content":"讯飞文档"
}
POST test_index/test_type
{
 "content":"石墨文档"
}
查询
GET test_index/test_type/_search
{
 "query":{
 "match": {
 "content.my_pinyin": "wen"
 }
 }
}

去除停词中的a,避免无法匹配
/data/elasticsearch/plugins/ik/config/stopword.dic

搜索效果:

中文:

image.png

拼音:

image.png

大小写字母:

image.png

创建多字段文档:

PUT doc_index
{
  "settings":{
    "number_of_shards":"1",
    "index.refresh_interval":"15s",
    "index":{
      "analysis":{
        "analyzer":{
           "ik_pinyin_analyzer":{
            "type":"custom",
            "tokenizer":"ik_smart",
            "filter":"pinyin_filter"
          }
        },
        "filter":{
          "pinyin_filter":{
            "type":"pinyin",
            "keep_first_letter": false
          }
        }
      }
    }
  }
 
}
PUT doc_index/_mapping/test_type
{
  "properties": {
    "content":{
      "type": "text",
      "analyzer": "ik_smart",
      "search_analyzer": "ik_smart",
      "fields": {
        "my_pinyin":{
          "type":"text",
          "analyzer": "ik_pinyin_analyzer",
          "search_analyzer": "ik_pinyin_analyzer"
        }
      }
    }, "title":{
      "type": "text",
      "analyzer": "ik_smart",
      "search_analyzer": "ik_smart",
      "fields": {
        "my_pinyin":{
          "type":"text",
          "analyzer": "ik_pinyin_analyzer",
          "search_analyzer": "ik_pinyin_analyzer"
        }
      }
    }
  }
}
POST doc_index/test_type
{
  "title":"我的文档",
  "content":"测试内容"
}
POST doc_index/test_type
{
  "title":"哈哈",
  "content":"文档内容"
}

权重测试:

GET doc_index/test_type/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title.my_pinyin": {
              "query": "wendang",
              "boost": 2
            }
          }
        },
        {
          "match": {
            "content.my_pinyin": "wendang"
          }
        }
      ]
    }
  }
}

权重测试结果:

image.png

2.Mogodb数据同步到elasticsearch

mongo-connector 该工具在MongoDB与目标系统间同步数据,并跟踪MongoDB的oplog,保持操作与MongoDB的实时同步。

mongo-connector -m 10.8.5.99:27017 -t 10.8.5.101:9200 -d elastic2_doc_manager
elastic2_doc_manager 将接收到的数据写入es

3.集群方案

支持副本集和分片