docker es搜索引擎方案实现测试
1、全文检索功能,(中文分词、拼音、英文大小写、标题内容权重)
拉取镜像
docker pull elasticsearch:6.6.0
使用镜像运行容器
docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:6.6.0
进入容器
docker exec -it elasticsearch /bin/bash
安装分词
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip
安装拼音
elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v6.6.0/elasticsearch-analysis-pinyin-6.6.0.zip
安装kabana
docker pull kibana:6.6.0
docker run -d --name kabana -p 5601:5601 kibana:6.6.0
docker exec -it kabana /bin/bash
vi config/kibana.yml
elasticsearch->10.100.123.13
exit
docker restart kabana
打开域名配置
对应192.168.0.1:5601
本地配置Hosts文件添加一行:
192.168.0.1 xx.com
打开dev tools
http://xx.com/app/kibana#/dev_tools/console?_g=()
创建索引
PUT test_index
{
"settings":{
"number_of_shards":"1",
"index.refresh_interval":"15s",
"index":{
"analysis":{
"analyzer":{
"ik_pinyin_analyzer":{
"type":"custom",
"tokenizer":"ik_smart",
"filter":"pinyin_filter"
}
},
"filter":{
"pinyin_filter":{
"type":"pinyin",
"keep_first_letter": false
}
}
}
}
}
}
参数:index.refresh_interval 索引刷新间隔时长,默认1s
测试
POST test_index/_analyze
{
"analyzer": "ik_pinyin_analyzer",
"text":"讯飞文档"
}
结果:
{
"tokens" : [
{
"token" : "xun",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "fei",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "wen",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "dang",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
}
]
}
设置content使用创建的分词组件
PUT test_index/_mapping/test_type
{
"properties": {
"content":{
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart",
"fields": {
"my_pinyin":{
"type":"text",
"analyzer": "ik_pinyin_analyzer",
"search_analyzer": "ik_pinyin_analyzer"
}
}
}
}
}
添加测试数据
POST test_index/test_type
{
"content":"讯飞文档"
}
POST test_index/test_type
{
"content":"石墨文档"
}
查询
GET test_index/test_type/_search
{
"query":{
"match": {
"content.my_pinyin": "wen"
}
}
}
去除停词中的a,避免无法匹配
/data/elasticsearch/plugins/ik/config/stopword.dic
搜索效果:
中文:
拼音:
大小写字母:
创建多字段文档:
PUT doc_index
{
"settings":{
"number_of_shards":"1",
"index.refresh_interval":"15s",
"index":{
"analysis":{
"analyzer":{
"ik_pinyin_analyzer":{
"type":"custom",
"tokenizer":"ik_smart",
"filter":"pinyin_filter"
}
},
"filter":{
"pinyin_filter":{
"type":"pinyin",
"keep_first_letter": false
}
}
}
}
}
}
PUT doc_index/_mapping/test_type
{
"properties": {
"content":{
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart",
"fields": {
"my_pinyin":{
"type":"text",
"analyzer": "ik_pinyin_analyzer",
"search_analyzer": "ik_pinyin_analyzer"
}
}
}, "title":{
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart",
"fields": {
"my_pinyin":{
"type":"text",
"analyzer": "ik_pinyin_analyzer",
"search_analyzer": "ik_pinyin_analyzer"
}
}
}
}
}
POST doc_index/test_type
{
"title":"我的文档",
"content":"测试内容"
}
POST doc_index/test_type
{
"title":"哈哈",
"content":"文档内容"
}
权重测试:
GET doc_index/test_type/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title.my_pinyin": {
"query": "wendang",
"boost": 2
}
}
},
{
"match": {
"content.my_pinyin": "wendang"
}
}
]
}
}
}
权重测试结果:
2.Mogodb数据同步到elasticsearch
mongo-connector 该工具在MongoDB与目标系统间同步数据,并跟踪MongoDB的oplog,保持操作与MongoDB的实时同步。
mongo-connector -m 10.8.5.99:27017 -t 10.8.5.101:9200 -d elastic2_doc_manager
elastic2_doc_manager 将接收到的数据写入es
3.集群方案
支持副本集和分片