OCS FullTextSearch Collection API

Added in version 27.

FullTextSearch includes an OCS API to index the content of your Nextcloud into an external search engine.

Concept overview

Because your structure might already host its own search engine, the FullTextSearch apps provide an OCS API to helps you index your users’ content and maintain an up-to-date index. The OCS API will allow your script to:

  • returns a list of not indexed, or freshly updated, document from Nextcloud,

  • extract content from document,

  • update internal index once a document have been indexed on your external search engine,

First steps

Installing the apps

On top of the fulltextsearch app, at least one content provider needs to be installed on the Nextcloud; meaning that a minimum of 2 apps have to be installed before using this feature:

$ ./occ app:enable fulltextsearch files_fulltextsearch

Initializing the collection

Using occ, create a new collection that will be used to sync the content indexed on the external search engine with the current content of the Nextcloud.

$ ./occ fulltextsearch:collection:init test

Note

test will be the name of the collection used in all example from this page.

Linking a collection to a user account

By default this API can only be used with an admin account but, for security reason, you can choose to link a non-admin account and use this account when requesting the API.

$ ./occ fulltextsearch:collection:link test user1

Warning

Keep in mind that the linked account will have access to the content of all documents of all Nextcloud users through the API.

Using the collection OCS API

Once the collection have been initialized, the normal uses of this API implies that your script:

  • make an OCS request to retrieve a list of document that have been created, modified and shared,

  • make OCS requests to get the content for the documents from the list,

  • index the content on your search engine and make an OCS request to confirm it,

  • returns to first step until the list of document is empty,

Retrieving the list of document to be (re-)indexed

The endpoint to get this list is:

/ocs/v2.php/apps/fulltextsearch/collection/<collection_name>/index

$ curl -X GET "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/index?format=json&length=50" -H "OCS-APIRequest: true" -u "admin:password"
{
    "ocs": {
        "meta": {
            "status": "ok",
            "statuscode": 200,
            "message": "OK"
        },
        "data": [
            {
                "url": "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/document/files/597996",
                "status": 28
            }
        ]
    }
}

Details about the response:

  • url is the link to the document,

  • status is a bitflag based on this list:
    • 1 => document has already been marked as indexed before,

    • 4 => meta have been modified,

    • 8 => content have been modified,

    • 16 => parts have been modified

    • 32 => document have been removed

Get data and metadata from a document

The endpoint to get data about a document is:

/ocs/v2.php/apps/fulltextsearch/collection/<collection_name>/document/<provider_id>/<document_id>

$ curl -X GET "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/document/files/597996?format=json" -H "OCS-APIRequest: true" -u "admin:password"
{
    "ocs": {
        "meta": {
            "status": "ok",
            "statuscode": 200,
            "message": "OK"
        },
        "data": {
            "id": "597996",
            "providerId": "files",
            "access": {
                "ownerId": "user1",
                "users": ['user2', 'user3'],
                "groups": ['group1'],
                "circles": [],
                "links": []
            },
            "index": {
                "ownerId": "user1",
                "providerId": "files",
                "collection": "test",
                "source": "files_local",
                "documentId": "597996",
                "lastIndex": 0,
                "errors": [],
                "errorCount": 0,
                "status": 28,
                "options": []
            },
            "title": "640-240-max.png",
            "link": "https://cloud.example.net/index.php/f/597996",
            "parts": {
                "comments": "<user3> This is a comment !"
            },
            "content": "VGhlIHF1aWNrIGJyb3duIGZveApqdW1wcyBvdmVyCnRoZSBsYXp5IGRvZy4=",
            "isContentEncoded": 1
        }
    }
}

Note

If isContentEncoded is set to 1, content needs to be decoded

$ php -r "echo base64_decode('VGhlIHF1aWNrIGJyb3duIGZveApqdW1wcyBvdmVyCnRoZSBsYXp5IGRvZy4=');"
The quick brown fox
jumps over
the lazy dog.

Set document as indexed

Once a document has been indexed in your external search engine, you have to notice the FullTextSearch of this action. This is done by doing a POST request on the following path:

/ocs/v2.php/apps/fulltextsearch/collection/<collection_name>/document/<provider_id>/<document_id>/done

$ curl -X POST "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/document/files/597996/done" -H "OCS-APIRequest: true" -u "admin:password"
{
    "ocs": {
        "meta": {
            "status": "ok",
            "statuscode": 200,
            "message": "OK"
        },
        "data": []
    }
}

Once set as indexed, the document will only returns to the list of document to be (re-)indexed if they get modified.

Reset collection

If needed, an endpoint is available to reset the whole index:

/ocs/v2.php/apps/fulltextsearch/collection/<collection_name>/index

$ curl -X DELETE -u "user1:password" "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/index" -H "OCS-APIRequest: true" -k