Data Architecture & Design: mongoDB design tips(3)

1. UTF-8 support.

a.MongoDB regex queries support UTF-8 in the regex string.

b.Currently, sort() on a string uses strcmp: sort order will be reasonable but not fully international correct.

Future versions of MongoDB may support full UTF-8 sort ordering.

2.don’t support Sequence Numbers, but you can use following method(careful using on sharding):

function counter(name) {

var ret = db.counters.findAndModify({query:{_id:name}, update:{$inc : {next:1}}, "new":true,

upsert:true});

// ret == { "_id" : "users", "next" : 1 }

return ret.next;

}

db.users.insert({_id:counter("users"), name:"Sarah C."}) // _id : 1

db.users.insert({_id:counter("users"), name:"Bob D."}) // _id : 2

//repeat

3. Collections number limitation.

By default MongoDB has a limit of approximately 24,000 namespaces per database. Each namespace is 628 bytes, the .ns file is 16MB by default. If more collections are required, run mongod with the --nssize parameter specified. This will make the <database>.ns file larger and support more collections. Note that --nssize sets the size used for newly created .ns files -- if you have an existing database and wish to resize, after running the db with --nssize, run the db.repairDatabase() command from the shell to adjust the size.

Maximum .ns file size is 2GB.

Be aware that there is a certain minimum overhead per collection -- a few KB. Further, any index will require at least 8KB of data space as the b-tree page size is 8KB. Certain operations can get slow if there are a lot of collections and the meta data gets paged out.

4．Keys Too Large To Index

Index entries have a limitation on their maximum size (the sum of the values), currently 1024 bytes (prior to v2.0 the maximum size was 819bytes). Documents whose fields have values (key size in index terminology) greater than this size cannot be indexed.

5．Index limitation:

Object has a Index(a,b,c)

a. The sort column must be the last column used in the index.

Good:

find(a=1).sort(a)

find(a=1).sort(b)

find(a=1, b=2).sort(c)

Bad:

find(a=1).sort(c)

even though c is the last column used in the index, a is that last column used, so you can only sort on a or b.

b. The range query must also be the last column in an index. This is an axiom of 1 above.

Good:

find(a=1,b>2)

find(a>1 and a<10)

find(a>1 and a<10).sort(a)

Bad:

find(a>1, b=2)

c. Only use a range query or sort on one column.

Good:

find(a=1,b=2).sort(c)

find(a=1,b>2)

find(a=1,b>2 and b<4)

find(a=1,b>2).sort(b)

Bad:

find(a>1,b>2)

find(a=1,b>2).sort(c)

d. Conserve indexes by re-ordering columns used on equality (non-range) queries.

Imagine you have the following two queries:

find(a=1,b=1,d=1)

find(a=1,b=1,c=1,d=1)

A single index defined on a, b, c, and d can be used for both queries.

If, however, you need to sort on the final value, you might need two indexes

e. MongoDB's $ne or $nin operator's aren't efficient with indexes.

When excluding just a few documents, it's better to retrieve extra rows from MongoDB and do the exclusion on the client side.

6.Pay attention to the read/write ratio of your application.

This is important because, whenever you add an index, you add overhead to all insert, update, and delete operations on the given collection. If your application is read-heavy, as are most web applications, the additional indexes are usually a good thing. But if your application is write-heavy, then be careful when creating new indexes, since each additional index with impose a small write-performance penalty.

In general, don't be cavalier about adding indexes. Indexes should be added to complement your queries. Always have a good reason for adding a new index, and make sure you've benchmarked alternative strategies.

7. Key names limitation

a. The '$' character must not be the first character in the key name.

b. The '.' character must not appear anywhere in the key name.

8. Schema design Summary of Best Practices

a."First class" objects, that are at top level, typically have their own collection.

b. Line item detail objects typically are embedded.

c. Objects which follow an object modelling "contains" relationship should generally be embedded.

d. Many to many relationships are generally done by linking.

e. Collections with only a few objects may safely exist as separate collections, as the whole collection is quickly cached in application server memory.

f. Embedded objects are a bit harder to link to than "top level" objects in collections.

g. It is more difficult to get a system-level view for embedded objects. When needed an operation of this sort is performed by using MongoDB's map/reduce facility.

h. If the amount of data to embed is huge (many megabytes), you may reach the limit on size of a single object.

I. If performance is an issue, embed.

When you Write the java program for mongo ,please notice it as following.

9. use requestStart and requestDone.

In turns the read operation may not see the data just written since replication is asynchronous. If you want to ensure complete consistency in a "session" (maybe an http request), you would want the driver to use the same socket, which you can achieve by using a "consistent request". Call requestStart() before your operations and requestDone() to release the connection back to the pool.

example codes:

DB db...;

db.requestStart();

db.requestEnsureConnection();

code....

db.requestDone();

10. WriteConcern option for single write operation

Since by default a connection is given back to the pool after each request, you may wonder how calling getLastError() works after a write. You

should actually use a write concern like WriteConcern.SAFE instead of calling getLastError() manually. The driver will then call getLastError()

before putting the connection back in the pool.

WriteConcern.NONE : No exceptions thrown.

WriteConcern.NORMAL : Exceptions are only thrown when the primary node is unreachable for a read, or the full replica set is unreachable.

WriteConcern.SAFE : Same as the above, but exceptions thrown when there is a server error on writes or reads. Calls getLastError().

WriteConcern.REPLICAS_SAFE : Tries to write to two separate nodes. Same as the above, but will throw an exception if two writes are not possible.

WriteConcern.FSYNC_SAFE : Same as WriteConcern.SAFE, but also waits for write to be written to disk.

example codes:

DBCollection coll...;

coll.insert(..., WriteConcern.SAFE);

// is equivalent to

DB db...;

DBCollection coll...;

db.requestStart();

coll.insert(...);

DBObject err = db.getLastError();

db.requestDone();

11. Saving Objects Using DBObject

The Java driver provides a DBObject interface to save custom objects to the database.

Example codes:

public class Tweet implements DBObject {

/* ... */

}

Then you can:

Tweet myTweet = new Tweet();

myTweet.put("user", userId);

myTweet.put("message", msg);

myTweet.put("date", new Date());

collection.insert(myTweet);

When a document is retrieved from the database, it is automatically converted to a DBObject. To convert it to an instance of your class, use

DBCollection.setObjectClass():

collection.setObjectClass(Tweet.class);

Tweet myTweet = (Tweet)collection.findOne();

If for some reason you wanted to change the message you can simply take that tweet and save it back after updating the field.

Tweet myTweet = (Tweet)collection.findOne();

myTweet.put("message", newMsg);

collection.save(myTweet);

12. How to insert embedded document.

BasicDBObject doc = new BasicDBObject();

doc.put("name", "MongoDB");

doc.put("type", "database");

doc.put("count", 1);

BasicDBObject info = new BasicDBObject();

info.put("x", 203);

info.put("y", 102);

doc.put("info", info);

coll.insert(doc);

13. How to insert array?

{

"x" :

[

{"foo" : "bar"},

]

}

ArrayList x = new ArrayList();

x.add(1);

x.add(2);

x.add(new BasicDBObject("foo", "bar"));

x.add(4);

BasicDBObject doc = new BasicDBObject("x", x);

14. Adding Multiple Documents

{

"i" : value

}

and we can do this fairly efficiently in a loop

for (int i=0; i < 100; i++) {

coll.insert(new BasicDBObject().append("i", i));

}

15. Regular Expressions

Pattern john = Pattern.compile("joh?n", CASE_INSENSITIVE);

BasicDBObject query = new BasicDBObject("name", john);

// finds all people with "name" matching /joh?n/i

DBCursor cursor = collection.find(query);

Data Architecture & Design

Tuesday, February 19, 2013

mongoDB design tips(3)

No comments:

Post a Comment