分类 --mongodb 下的文章

mongodb占用太大内存的问题

mongodb对内存占用是否可怕,服务器内存是8G,但是mongodb并没有多少记录,大概删掉后只有几千条,但是却占用了35%的内存(先top后,然后 shift+m 把当前进场按占用内存的多少排序),惊人。没有使用swap。
mongodb把内存的管理交给了系统,所以在程序里面并没有办法处理这部分内存的占用,db.repairDatabase(),返回OK后,内存占用依然很高,最后只能暂用重启的办法,use admin;db.shutdownServer();或者kill,之后再启动,启动成功后,内存和CPU粉笔只占用l%.mongostat后看也很正常。
但是这个问题并没有很好的解决,怎样才能在不重启的情况下解决这个问题还不知道。另外,在linux里面,可以结合ulimit来控制mongodb的内存使用大小。

mongodb常用操作

以下是在mongodb3.2实际操作。
数据备份:
mongodump --host IP --port 端口 -u 用户名 -p 密码 -d 数据库 -o 文件存在路径

[dev@aws17-pg ~]$ mongodump --host 10.0.1.12 --port 27020 -d applock -o /home/dev/backup/
2017-02-15T03:39:38.115-0500    writing applock.lockdetail to 
2017-02-15T03:39:41.116-0500    [######..................]  applock.lockdetail  932703/3502968  (26.6%)
2017-02-15T03:39:44.158-0500    [############............]  applock.lockdetail  1854275/3502968  (52.9%)
2017-02-15T03:39:47.116-0500    [###################.....]  applock.lockdetail  2832777/3502968  (80.9%)
2017-02-15T03:39:49.174-0500    [########################]  applock.lockdetail  3502968/3502968  (100.0%)
2017-02-15T03:39:49.174-0500    done dumping applock.lockdetail (3502968 documents)

数据还原:
mongorestore -h IP --port 端口 -u 用户名 -p 密码 -d 数据库 --drop 文件存在路径
--drop的意思是,先删除所有的记录,然后恢复。

[root@pg-test ~]# mongorestore -d applock /root/applock
connected to: 127.0.0.1
2017-02-15T17:20:50.057+0800 /root/applock/lockdetail.bson
2017-02-15T17:20:50.057+0800    going into namespace [applock.lockdetail]
2017-02-15T17:20:50.110+0800    Created collection applock.lockdetail with options: { "create" : "lockdetail" }
2017-02-15T17:20:53.034+0800        Progress: 52164170/622513894    8%  (bytes)
2017-02-15T17:20:56.000+0800        Progress: 104398989/622513894   16% (bytes)
2017-02-15T17:20:59.016+0800        Progress: 156726761/622513894   25% (bytes)
2017-02-15T17:21:02.015+0800        Progress: 201102195/622513894   32% (bytes)
2017-02-15T17:21:05.030+0800        Progress: 252299457/622513894   40% (bytes)
2017-02-15T17:21:08.033+0800        Progress: 306753671/622513894   49% (bytes)
2017-02-15T17:21:11.020+0800        Progress: 361233088/622513894   58% (bytes)
2017-02-15T17:21:14.027+0800        Progress: 408729106/622513894   65% (bytes)
2017-02-15T17:21:17.011+0800        Progress: 462108350/622513894   74% (bytes)
2017-02-15T17:21:20.000+0800        Progress: 512628006/622513894   82% (bytes)
2017-02-15T17:21:23.013+0800        Progress: 570320886/622513894   91% (bytes)
3502968 objects found
2017-02-15T17:21:25.962+0800    Creating index: { key: { _id: 1 }, name: "_id_", ns: "applock.lockdetail" }

查看版本:
db.version()

关闭:
use admin
db.shutdownServer()

另外:
如果mongodb启动不起来,报:child process failed, exited with error number 1,或者其它数字,检查是否目录都存在且具有写入权限,另外尝试用repaire修复

slave执行show dbs
rs.slaveOk()
不然会报错 Error: listDatabases failed:{ "note" : "from execCommand", "ok" : 0, "errmsg" : "not master" }

MongoDB中查询转换(将时间戳转变通用日期格式)

在mongodb中日期保存的是long形的,但是打印出来不好看,需要在查询的时候做一定的转化,代码如下:

Date.prototype.Format = function (fmt) { //author: meizz 
var o = {
    "M+": this.getMonth() + 1, //月份 
    "d+": this.getDate(), //日 
    "h+": this.getHours(), //小时 
    "m+": this.getMinutes(), //分 
    "s+": this.getSeconds(), //秒 
    "q+": Math.floor((this.getMonth() + 3) / 3), //季度 
    "S": this.getMilliseconds() //毫秒 
};
if (/(y+)/.test(fmt)) fmt = fmt.replace(RegExp.$1, (this.getFullYear() + "").substr(4 - RegExp.$1.length));
for (var k in o)
if (new RegExp("(" + k + ")").test(fmt)) fmt = fmt.replace(RegExp.$1, (RegExp.$1.length == 1) ? (o[k]) : (("00" + o[k]).substr(("" + o[k]).length)));
return fmt;
}
db.getCollection('state').find({"isall":"1"}).sort({"st":-1}).forEach(function (a) { a["st"] = (new         Date(a["st"]).Format("yyyy-MM-dd"));a["ut"] = (new Date(a["ut"]).Format("yyyy-MM-dd")); printjson(a) })

核心是需要利用forEach去遍历。

mongodb mapreduce主要方法详解

mongodb在分组统计用途比较大,相当于mysql里面的group by,但比group by 功能更强大。

db.collection.mapReduce( //要操作的目标集合
	<map>,//映射函数 (生成键值对序列,作为 reduce 函数参数)
	<reduce>,//统计函数
	{
		out: <collection>,//统计结果存放集合 (不指定则使用临时集合,在客户端断开后自动删除)。必须要小于BSON文档的大小(16M)限制
		query: <document>,//目标记录过滤。
		sort: <document>,//目标记录排序。
		limit: <number>,//限制目标记录数量。
		finalize: <function>,//最终处理函数 (对 reduce 返回结果进行最终整理后存入结果集合)。
		scope: <document>,//向 map、reduce、finalize 导入外部变量。
		jsMode: <boolean>,//从BSON转化为JSON,执行Map过程,将JSON转化为BOSN,从BSON转化为JSON,执行Reduce过程,将JSON转化为BSON,注意,jsMode 受到JSON堆大小和独立主键最大500KB的限制。因此,对于较大的任务jsMode并不适用,在这种情况下会转变为通常的模式
		verbose: <boolean>,//显示详细的时间统计信息。
		bypassDocumentValidation: <boolean>
	}
)

数据准备

db.emp.insert({level:"t1", age:24, name:'Tom'});
db.emp.insert({level:"t2", age:22, name:'Jacky'});
db.emp.insert({level:"t3", age:26, name:'Lily'});
db.emp.insert({level:"t2", age:29, name:'Tony'});
db.emp.insert({level:"t3", age:21, name:'Harry'});
db.emp.insert({level:"t4", age:35, name:'Vincent'});
db.emp.insert({level:"t2", age:25, name:'Bill'});
db.emp.insert({level:"t3", age:25, name:'Bruce'});
db.emp.insert({level:"t1", age:25, name:'tencent'});
db.emp.insert({level:"t4", age:25, name:'baidu'});
map
map函数必须调用 emit(key, value) 返回键值对。map里面有如下限制和要求:
1:In the map function, reference the current document as this within the function.
this为当前处理的文档
2:The map function should not access the database for any reason.
不应该访问任何非this的其它集合,事实上也没办法访问,用db变量访问会直接报错
3:The map function should be pure, or have no impact outside of the function (i.e. side effects.)
不能的影响其它功能
4:A single emit can only hold half of MongoDB’s maximum BSON document size.
emit的容量只能试文档容量的一半,也是8MB,理论上讲分组后不会存在这么大的文档。
5:The map function may optionally call emit(key,value) any number of times to create an output document associating key with value.

reduce
通过map运算后,key为分组字段,其value的集合(即数组)会传给reduce函数执行
特别注意:若value集合长度小于2,则该分组不会传到reduce函数执行,任何需要在reduce里面对key就行统计操作的,需要特别注意,有可能会以后key。上面的示例数据,可以删掉最后两条看看效果。t1和t4并不会再reduce里面出现。

finalize
利用 finalize() 我们可以对 reduce() 的结果做进一步处理。处理完毕后要返回给存储,可以自定义结果。

案例
公司职员有t1,t2,t3,t4不同的职位等级,有职员名字和年龄。
现在需要统计不同的职位的人员的个数

m = function () {
	total = total +1
	// if(this.age==25){
		emit(this.level, 1);
	// }
	print("total in map:"+total)
}
r = function (key, values) {
	total = total +1
	var x = 0;
	print(separator)
	print(key+separator+values)
	// print(values)
	values.forEach(function (v) {
		x += v;
		// print(v);
	});
	print("total in reduce:"+total)
	return x;
}
res = db.runCommand({
	mapreduce:"emp",
	map:m,
	reduce:r,
	// query:{"age":25},
	// sort:{"age":-1},
	// limit:2,
	// verbose:true,
	finalize:function(key, reducedValue) {
		total = total +1
		print(key+separator+reducedValue)
		print("total in finalize:"+total)
		return reducedValue;
		// return {level:key,count:reducedValue};
	},
	scope:{separator:"\t",total:0},
	out:"emp_result"
});

其中打印的日志过程为:
2016-12-12T20:38:05.536+0800 I - [conn160] total in map:1
2016-12-12T20:38:05.536+0800 I - [conn160] total in map:2
2016-12-12T20:38:05.536+0800 I - [conn160] total in map:3
2016-12-12T20:38:05.537+0800 I - [conn160] total in map:4
2016-12-12T20:38:05.537+0800 I - [conn160] total in map:5
2016-12-12T20:38:05.537+0800 I - [conn160] total in map:6
2016-12-12T20:38:05.537+0800 I - [conn160] total in map:7
2016-12-12T20:38:05.537+0800 I - [conn160] total in map:8
2016-12-12T20:38:05.537+0800 I - [conn160] total in map:9
2016-12-12T20:38:05.537+0800 I - [conn160] total in map:10
2016-12-12T20:38:05.537+0800 I - [conn160]
2016-12-12T20:38:05.537+0800 I - [conn160] t1 1,1
2016-12-12T20:38:05.537+0800 I - [conn160] total in reduce:11
2016-12-12T20:38:05.537+0800 I - [conn160]
2016-12-12T20:38:05.537+0800 I - [conn160] t2 1,1,1
2016-12-12T20:38:05.538+0800 I - [conn160] total in reduce:12
2016-12-12T20:38:05.538+0800 I - [conn160]
2016-12-12T20:38:05.538+0800 I - [conn160] t3 1,1,1
2016-12-12T20:38:05.538+0800 I - [conn160] total in reduce:13
2016-12-12T20:38:05.538+0800 I - [conn160]
2016-12-12T20:38:05.538+0800 I - [conn160] t4 1,1
2016-12-12T20:38:05.538+0800 I - [conn160] total in reduce:14
2016-12-12T20:38:05.539+0800 I - [conn160] t1 2
2016-12-12T20:38:05.539+0800 I - [conn160] total in finalize:15
2016-12-12T20:38:05.539+0800 I - [conn160] t2 3
2016-12-12T20:38:05.539+0800 I - [conn160] total in finalize:16
2016-12-12T20:38:05.539+0800 I - [conn160] t3 3
2016-12-12T20:38:05.539+0800 I - [conn160] total in finalize:17
2016-12-12T20:38:05.539+0800 I - [conn160] t4 2
2016-12-12T20:38:05.539+0800 I - [conn160] total in finalize:18
统计的结果为:

/* 1 */
{
"_id" : "t1",
"value" : 2
}

/* 2 */
{
"_id" : "t2",
"value" : 3
}

/* 3 */
{
"_id" : "t3",
"value" : 3
}

/* 4 */
{
"_id" : "t4",
"value" : 2
}

通过例子可以总结为(我自己的理解,有可能是错的):

1.大致的执行过程为map->过滤条件->reduce->限制条件->finalize
2.query不是必须的,可以在map里面也可以进行过滤
3.通过print可以打印,在日志里面查看打印结果
4.通过日志(drop applock.tmp.mr.emp_59)可以看出,中间的临时结果会被删除
5.scope的变量是全局变量,可以在map,reduce,finalize中间共享

mongodb表名(集合)长度限制和命名规则

mongodb在表名使用过程中会产生一些问题。

问题一:表名不能含有system.,这个系统会保留,出错信息如下

Traceback (most recent call last):
  File "/opt/backend-job/scripts/applock/scan.py", line 248, in insert_row
    conn.applock[k].bulk_write(v[ti*1000:(ti+1)*1000])
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/collection.py", line 432, in bulk_write
    bulk_api_result = blk.execute(self.write_concern.document)
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/bulk.py", line 468, in execute
    return self.execute_command(sock_info, generator, write_concern)
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/bulk.py", line 300, in execute_command
    run.ops, True, self.collection.codec_options, bwc)
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/message.py", line 573, in write_command
    reply = self.sock_info.write_command(request_id, msg)
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/pool.py", line 284, in write_command
    helpers._check_command_response(result)
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/helpers.py", line 196, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
OperationFailure: cannot write to 'system.blue.theme'

其中system.blue.theme插入不进去,会直接报错。

在网上找的关于表名的命名规则为:

1.集合名不能为空字符串(" ")

2.不能包含\0或空字符,这个字符表示键的结尾

3.集合名不能以"system."开头,此前缀是系统本身保留的

4.集合名不能包含$字符(注:可包含 . 点号)

官方的说明为:

Collection names should begin with an underscore or a letter character, and cannot:

  • contain the $.
  • be an empty string (e.g. "").
  • contain the null character.
  • begin with the system. prefix. (Reserved for internal use.)

 

问题二:表名的长度有限制

我实际测试的长度为113个字符,官方的的说明链接为:https://docs.mongodb.com/manual/reference/limits/,The maximum length of the collection namespace, which includes the database name, the dot (.) separa。出错信息如下:

Traceback (most recent call last):
  File "/opt/backend-job/scripts/applock/scan.py", line 252, in insert_row
    conn.applock[k].create_index("locked")
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/collection.py", line 1380, in create_index
    self.__create_index(keys, kwargs)
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/collection.py", line 1290, in __create_index
    sock_info, cmd, read_preference=ReadPreference.PRIMARY)
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/collection.py", line 205, in _command
    read_concern=read_concern)
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/pool.py", line 213, in command
    read_concern)
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/network.py", line 99, in command
    helpers._check_command_response(response_doc, None, allowable_errors)
  File "/opt/backend-job/scripts/applock/2.6/lib/python2.6/site-packages/pymongo/helpers.py", line 196, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
OperationFailure: namespace name generated from index name "applock.com.goldwallpaper.goldpictures.money.goldlight.goldblack.pattern.luxury.metallic.background.images.art.free.hd.$locked_1" is too long (127 byte max)