Docs and Doc Sets

Docugami currently supports PDF, DOCX, and DOC files. These are organized into Doc Sets, which are groups of similar documents (leases, contracts, etc.). Document Sets need at least 6 similar documents to be used within Docugami.


In a constantly evolving work environment, Docugami allows for flexibility in the types of documents it can process and understand. Users can upload scanned PDFs, digital PDFs, DOC files, or DOCX files. These make up the vast majority of business documents in the world today. 

Typically, documents are created, edited, and negotiated using a word processing program like Microsoft Word or Google Docs. Both of these popular applications support similar formats for editable docs, and can export docs to PDF for sharing or archiving. 

Account tiers are largely measured by page count (although they also include user count). You can upload as documents  up to your page limit. To get more pages, simply upgrade your account! 

While we use state of the art models to process the content of your docs, including lists and tables and more, documents that are successfully uploaded can still have errors that affect their use. Common issues are:

  • Images, handwriting, and non-English material are not (currently) processed. (We're working on it!)
  • Encrypted documents cannot be processed.
  • The maximum number of pages in a document is 100 (if you have larger documents you might wish to split them into several documents to accommodate this)
  • The maximum file size is 50 MB (you might wish to split a larger doc into several docs to accommodate this)
  • Any other errors you might run into can be looked into by our Docugami team so feel free to reach out if you're having issues.


Key to the workflow of Docugami is organizing your documents into Doc Sets. Docugami is most effective when grouping your documents by type - so you might have a doc set of Master Service Agreements, another with Scopes of Work, and another might be Contractor Contracts. By keeping documents grouped in this way, Docugami learns about different chunks of information found in similar contexts within these doc sets.

Docugami will automatically cluster your documents into docsets. You can then review these automatically created doc sets and edit them as needed, removing or reassigning docs to and from doc sets as necessary.

At any future point, you may wish to upload more documents into Docugami. Docugami will automatically process these new documents, creating new doc sets and adding them to existing doc sets where applicable.

Alternatively, you can bypass Docugami's clustering and organize your documents yourself into manually created doc sets. Either way, be aware of the following conditions:

  • A doc set MUST be comprised of at least six successfully processed documents (some documents have errors and these do not count towards the six even if the doc in question is in the given doc set).
  • You CAN manually create doc sets of different doc types but please be aware that results may be compromised if the documents are not similar to each other.
  • The ideal way to add docs to Docugami would be to organize your documents by type into folders and then upload those folders. Once added, you can allow Docugami to cluster them or create the doc sets yourself.