Troubleshooting Mongo DB Sources
Connector Limitations
MongoDB Oplog and Change Streams
MongoDB's Change Streams are based on the Replica Set Oplog. This has retention limitations. Syncs that run less frequently than the retention period of the Oplog may encounter issues with missing data.
We recommend adjusting the Oplog size for your MongoDB cluster to ensure it holds at least 24 hours of changes. For optimal results, we suggest expanding it to maintain a week's worth of data. To adjust your Oplog size, see the corresponding tutorials for MongoDB Atlas (fully-managed) and MongoDB shell (self-hosted).
If you are running into an issue similar to "invalid resume token", it may mean you need to:
- Increase the Oplog retention period.
- Increase the Oplog size.
- Increase the Airbyte sync frequency.
You can run the commands outlined in this tutorial to verify the current of your Oplog. The expect output is:
configured oplog size: 10.10546875MB
log length start to end: 94400 (26.22hrs)
oplog first event time: Mon Mar 19 2012 13:50:38 GMT-0400 (EDT)
oplog last event time: Wed Oct 03 2012 14:59:10 GMT-0400 (EDT)
now: Wed Oct 03 2012 15:00:21 GMT-0400 (EDT)
When importing a large MongoDB collection for the first time, the import duration might exceed the Oplog retention period. The Oplog is crucial for incremental updates, and an invalid resume token will require the MongoDB collection to be re-imported to ensure no source updates were missed.
MongoDB CDC Limitations
MongoDB has a 16MB maximum document size limit for BSON documents. During CDC (Change Data Capture) syncs, change stream events can exceed this limit when documents are large, causing a BSONObjectTooLarge error. This typically occurs during incremental syncs when change stream events include the full document content.
If you encounter this error, you have several options to resolve it:
- Switch the affected stream to Full Refresh sync mode instead of Incremental mode. Full Refresh does not use change streams and is not subject to this limitation.
- If you are using Post Image update capture mode, switch to Lookup mode. Lookup mode retrieves the current document state separately, which can reduce the size of change stream events.
- Restructure large documents in your MongoDB collection to stay under the 16MB limit.
- Deselect streams containing documents that exceed the size limit.
For more information about MongoDB's document size limits, see the MongoDB documentation on limits.
Supported MongoDB Clusters
- Only supports replica set cluster type.
- TLS/SSL is required by this connector. TLS/SSL is enabled by default for MongoDB Atlas clusters. To enable TSL/SSL connection for a self-hosted MongoDB instance, please refer to MongoDb Documentation.
- Views, capped collections and clustered collections are not supported.
- Empty collections are excluded from schema discovery.
- Collections with different data types for the values in the
_idfield among the documents in a collection are not supported. All_idvalues within the collection must be the same data type. - Atlas DB cluster are only supported in a dedicated M10 tier and above. Lower tiers may fail during connection setup.
Schema Discovery & Enforcement
- Schema discovery uses sampling of the documents to collect all distinct top-level fields. This value is universally applied to all collections discovered in the target database. The approach is modelled after MongoDB Compass sampling and is used for efficiency. By default, 10,000 documents are sampled. This value can be increased up to 100,000 documents to increase the likelihood that all fields will be discovered. However, the trade-off is time, as a higher value will take the process longer to sample the collection.
- When Running with Schema Enforced set to
falsethere is no attempt to discover any schema. See more in Schema Enforcement.
Vendor-Specific Connector Limitations
Not all implementations or deployments of a database will be the same. This section lists specific limitations and known issues with the connector based on how or where it is deployed.
Self Hosted MongoDB
Airbyte does not support self-signed SSL certificates for SSH tunnels.
AWS DocumentDB
The Airbyte connector does not support custom SSL certificates, which DocumentDB requires.