connectors
connectors copied to clipboard
SharePoint Server NTLM and Host Named Site Collections support
Closes https://github.com/elastic/connectors-py/issues/2340
The existing connector only supports host header site collections and only the "sites" managed path, meaning site collections that derive their URL from the web application's URL. SharePoint supports host named site collections which can have any URL, so it doesn't make sense to force the user to only index a particular subset of sites. For NTLM, since aiohttp doesn't support NTLM authentication the only option with async support is "httpx", so this is what I used.
Checklists
Pre-Review Checklist
- [x] this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check
config.yml.example
) - [x] this PR has a meaningful title
- [x] this PR links to all relevant github issues that it fixes or partially addresses
- [x] if there is no GH issue, please create it. Each PR should have a link to an issue
- [x] this PR has a thorough description
- [x] Covered the changes with automated tests
- [x] Tested the changes locally
- [ ] Added a label for each target release version (example:
v7.13.2
,v7.14.0
,v8.0.0
) - [x] Considered corresponding documentation changes
- [ ] Contributed any configuration settings changes to the configuration reference
- [ ] if you added or changed Rich Configurable Fields for a Native Connector, you made a corresponding PR in Kibana
Changes Requiring Extra Attention
- [ ] Security-related changes (encryption, TLS, SSRF, etc)
- [x] New external service dependencies added.
Release Note
💚 CLA has been signed
buildkite test this
@anghelnicolae thanks for submitting a PR!
First, in order for any contributions from you to be accepted, you'll need to sign Elastic's Contributor Agreement. You can access that here: https://www.elastic.co/contributor-agreement
Next, we'll need to make sure all the tests pass. You can run tests locally with make test
, but Buildkite will also run them in the PR. Looking at the changes you made, I expect the tests will fail with your changes and will need some updates.
Before you invest in changing all the tests though, I'd suggest a different approach.
First, for NTLM support, is it possible to use one client (httpx) to handle the authentication, and the existing client to handle everything else? This would limit the blast radius of your change.
Second, handling non-path-based site collections is something we've had reported for Sharepoint Online as well. See: https://github.com/elastic/connectors/issues/2112
I'd like to see us address both connectors in a consistent manner. And I think the right way to do that is probably with a new configuration field to indicate what type of site collection identifier you're providing. The change as you've made it would break this connector for all its current users when they upgraded, and that's not going to be acceptable.
@seanstory I did sign the agreement, but only after I submitted the PR. You're right, I forgot to run the tests. I did now and they fail because of my updates. I'm not sure I know how to authenticate with one client and do the API requests with another. Is it even a good idea? For the second issue, you're again right, I'm going to think of a way to handle existing users.
What happens now? Is this PR closed and I submit a new one after I update my code?
I did sign the agreement, but only after I submitted the PR.
That's fine! The CLA check is passing now. ✅
I'm not sure I know how to authenticate with one client and do the API requests with another. Is it even a good idea?
Maybe it's not. I don't know a lot about NTLM, so maybe it doesn't make sense. But we've had instances before where you have two instance variables, like (psudocode):
def __init__(self):
self.oauth_client = OauthClientFactory.getClient()
token = self.oauth_client.get_token()
self.client = Client(headers={f"Authorization: {token}"})
...
so that you don't have to implement a complex auth flow, but can use an existing library to facilitate getting a lower-level authnetication token that results from that flow. This was the kind of pattern I was suggesting.
For the second issue, you're again right, I'm going to think of a way to handle existing users.
thanks!
What happens now? Is this PR closed and I submit a new one after I update my code?
I'll leave that up to you. If you want to iterate on this PR, it's fine to leave this open. You can move it to a "draft" state and then let us know when you feel it's ready for a review.
Alternatively, if you'd rather start over from scratch, we can close this. And it's not a 1-way door, we can always re-open a closed PR later.
Re: NTLM - is it a setting in Sharepoint Server, that you can use only one or the other? Or is it still possible to access Sharepoint Server with basic auth even if NTLM is enabled?
Re: NTLM - is it a setting in Sharepoint Server, that you can use only one or the other? Or is it still possible to access Sharepoint Server with basic auth even if NTLM is enabled?
In IIS, under the site hosting the SharePoint web application that you're targeting, there is an "Authentication" setting. This setting allows multiple authentication methods to be available for the site: Anonymous, ASP.NET Impersonation, Basic, Forms, Windows. Basic Authentication is by default disabled and rightfully so. To answer your question, yes, it's possible to access SharePoint with both Basic and NTLM(Windows) at the same time.
In IIS, under the site hosting the SharePoint web application that you're targeting, there is an "Authentication" setting. This setting allows multiple authentication methods to be available for the site: Anonymous, ASP.NET Impersonation, Basic, Forms, Windows. Basic Authentication is by default disabled and rightfully so. To answer your question, yes, it's possible to access SharePoint with both Basic and NTLM(Windows) at the same time.
I see. From what I understood NTLM is relatively popular, so we can have it here. If you don't severely need it, I'd skip it TBH. If you need it, we can make it configurable via configurable field - user chooses type of auth and in the connector we instantiate the client based on chosen auth method. I checked aiohttp
with NTLM and there indeed is no simple way to make it work.
We'll look into httpx
package, but from the first glance it seems okay - my only concern that their version is 0.x
- will need to spend a bit of time to see how mature the package is.
In IIS, under the site hosting the SharePoint web application that you're targeting, there is an "Authentication" setting. This setting allows multiple authentication methods to be available for the site: Anonymous, ASP.NET Impersonation, Basic, Forms, Windows. Basic Authentication is by default disabled and rightfully so. To answer your question, yes, it's possible to access SharePoint with both Basic and NTLM(Windows) at the same time.
I see. From what I understood NTLM is relatively popular, so we can have it here. If you don't severely need it, I'd skip it TBH. If you need it, we can make it configurable via configurable field - user chooses type of auth and in the connector we instantiate the client based on chosen auth method. I checked
aiohttp
with NTLM and there indeed is no simple way to make it work.We'll look into
httpx
package, but from the first glance it seems okay - my only concern that their version is0.x
- will need to spend a bit of time to see how mature the package is.
I'm going to be indexing farms that I have no control over, so they won't have Basic Authentication enabled. So I really need it.
I'm going to be indexing farms that I have no control over, so they won't have Basic Authentication enabled. So I really need it.
Makes sense. In this case we will need to:
I've addressed most of the recommendations, but I haven't modified the tests yet. Can you take a look and let me know if it's OK now?
Hi @anghelnicolae,
I've skimmed through the changes and they look good to me.
I will give some manual testing this week and get back to you as soon as possible!
buildkite test this
☝️ checking to make sure the unit tests still pass.
I'd also love to see some unit tests added for these changes.
@anghelnicolae I'm not sure if you can see our buildkite CI. In case you can't, the tests are not currently passing.
The current test output
FAIL Required test coverage of 92% not reached. Total coverage: 91.51%
=========================== short test summary info ============================
FAILED tests/sources/test_sharepoint_server.py::test_get_drive_items - TypeError: SharepointServerClient.get_drive_items() missing 2 required positional arguments: 'site_relative_url' and 'list_relative_url'
FAILED tests/sources/test_sharepoint_server.py::test_prepare_list_items_doc - TypeError: SharepointServerDataSource.format_list_item() missing 2 required positional arguments: 'site_url' and 'site_relative_url'
FAILED tests/sources/test_sharepoint_server.py::test_get_list_items - TypeError: SharepointServerClient.get_list_items() missing 2 required positional arguments: 'site_relative_url' and 'list_relative_url'
FAILED tests/sources/test_sharepoint_server.py::test_get_docs_drive_items_for_web_pages - assert 0 == 2
+ where 0 = len([])
FAILED tests/sources/test_sharepoint_server.py::test_get_content_with_content_extraction - AssertionError: assert None == {'_id': 1, '_timestamp': '2022-06-20T10:37:44Z', 'body': 'This is a dummy sharepoint body response'}
FAILED tests/sources/test_sharepoint_server.py::test_get_docs_drive_items - assert 0 == 2
+ where 0 = len([])
FAILED tests/sources/test_sharepoint_server.py::test_api_call_successfully - KeyError: 'site_collections'
FAILED tests/sources/test_sharepoint_server.py::test_get_list_items_with_extension_only - TypeError: SharepointServerClient.get_list_items() missing 2 required positional arguments: 'site_relative_url' and 'list_relative_url'
FAILED tests/sources/test_sharepoint_server.py::test_get_docs_list_items - assert 0 == 2
+ where 0 = len([])
FAILED tests/sources/test_sharepoint_server.py::test_get_docs_list_items_when_relativeurl_is_not_none - assert 0 == 2
+ where 0 = len([])
FAILED tests/sources/test_sharepoint_server.py::test_get_docs_when_no_site_available
FAILED tests/sources/test_sharepoint_server.py::test_close_with_client_session - AttributeError: 'ClientSession' object has no attribute 'aclose'. Did you mean: 'close'?
FAILED tests/sources/test_sharepoint_server.py::test_prepare_drive_items_doc - TypeError: SharepointServerDataSource.format_drive_item() missing 2 required positional arguments: 'site_url' and 'site_relative_url'
FAILED tests/sources/test_sharepoint_server.py::test_get_content - AssertionError: assert None == {'_attachment': 'VGhpcyBpcyBhIGR1bW15IHNoYXJlcG9pbnQgYm9keSByZXNwb25zZQ==', '_id': 1, '_timestamp': '2022-06-20T10:37:44Z'}
FAILED tests/sources/test_sharepoint_server.py::test_get_list_items_with_no_extension - TypeError: SharepointServerClient.get_list_items() missing 2 required positional arguments: 'site_relative_url' and 'list_relative_url'
================= 15 failed, 1935 passed in 194.42s (0:03:14) ==================
make: *** [Makefile:59: test] Error 1
🚨 Error: The command exited with status 2
You can run the tests locally with make bin/pytest test
You will also need to get the linter to pass. Usually this can be done automagically with make autoformat
, but you can check if there are linting errors with make lint
.
Eventually we'll also need to get the integration tests (ftests) to pass. You can run this with make ftest NAME=sharepoint_server
. These tests can be harder to get working, and may require changes to the test fixtures here: https://github.com/elastic/connectors/tree/main/tests/sources/fixtures/sharepoint_server.
@anghelnicolae I'm not sure if you can see our buildkite CI. In case you can't, the tests are not currently passing.
The current test output You can run the tests locally with
make bin/pytest test
You will also need to get the linter to pass. Usually this can be done automagically with
make autoformat
, but you can check if there are linting errors withmake lint
.Eventually we'll also need to get the integration tests (ftests) to pass. You can run this with
make ftest NAME=sharepoint_server
. These tests can be harder to get working, and may require changes to the test fixtures here: https://github.com/elastic/connectors/tree/main/tests/sources/fixtures/sharepoint_server.
Hi @seanstory, Yes, I know the tests aren't working yet, I wanted to make sure you guys are OK with my changes before making further updates. I've tested the changes on my SharePoint environment and everything works fine.
Looking at it and testing a bit it looks like everything is good - you're using a great approach, now fixing tests/linters is next step, after that we'll do some testing on a real instance and will be happy to merge this change!
On it. I'll let you know when I finish.
I've finished updating the tests. I also ran the autoformatter.
Buildkite test this
Hey @anghelnicolae!
I ran the build and see that linter is not happy about connectors/sources/sharepoint_server.py
file, can you check it?
Additionally, our functional tests broke with the change - likely due to the changes in configuration that you've added. You can run functional tests with make ftest NAME=sharepoint_server
and see the error. Likely you will need to update https://github.com/elastic/connectors/blob/main/tests/sources/fixtures/sharepoint_server/connector.json file to change the connector configuration to fit the new configuration schema.
Let me know if any of these steps are not clear, and I'll be happy to guide you :)
Buildkite test this
@anghelnicolae I see that CI is still red, have the functional test ran successfully for you?
@anghelnicolae I see that CI is still red, have the functional test ran successfully for you?
No, but they don't really try to run, so I'm probably missing something in my environment. I fixed the obvious errors from the previous run, but I have no clue what to do next. All this is pretty difficult for someone who's never worked with Python or Linux before. It would be a very good idea for someone to write some documentation about how to setup a development environment.
These are the errors I'm getting when trying to run the functional tests: `
None
- /home/nicolae/Git/connectors/tests/../bin/fake-kibana --index-name search-sharepoint_server --service-type sharepoint_server --config-file sharepoint_server/config.yml --connector-definition sharepoint_server/connector.json --debug [FMWK][14:03:10][INFO] Loading config from sharepoint_server/config.yml [KBN-FAKE][14:03:10][INFO] Loaded connector definition from sharepoint_server/connector.json Traceback (most recent call last): File "/home/nicolae/Git/connectors/tests/../bin/fake-kibana", line 33, in <module> sys.exit(load_entry_point('elasticsearch-connectors', 'console_scripts', 'fake-kibana')()) File "/home/nicolae/Git/connectors/connectors/kibana.py", line 250, in main asyncio.run( File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/home/nicolae/Git/connectors/connectors/kibana.py", line 53, in prepare await es.ensure_ingest_pipeline_exists( File "/home/nicolae/Git/connectors/connectors/es/management_client.py", line 159, in ensure_ingest_pipeline_exists await self._retrier.execute_with_retry( File "/home/nicolae/Git/connectors/connectors/es/client.py", line 226, in execute_with_retry result = await func() File "/home/nicolae/Git/connectors/lib/python3.10/site-packages/elasticsearch/_async/client/ingest.py", line 148, in get_pipeline return await self.perform_request( # type: ignore[return-value] File "/home/nicolae/Git/connectors/lib/python3.10/site-packages/elasticsearch/_async/client/_base.py", line 389, in perform_request return await self._client.perform_request( File "/home/nicolae/Git/connectors/lib/python3.10/site-packages/elasticsearch/_async/client/_base.py", line 285, in perform_request meta, resp_body = await self.transport.perform_request( File "/home/nicolae/Git/connectors/lib/python3.10/site-packages/elastic_transport/_async_transport.py", line 258, in perform_request resp = await node.perform_request( File "/home/nicolae/Git/connectors/lib/python3.10/site-packages/elastic_transport/_node/_http_aiohttp.py", line 218, in perform_request raise err from None elastic_transport.ConnectionError: Connection error caused by: ConnectionError(Connection error caused by: ClientConnectorError(Cannot connect to host localhost:9200 ssl:default [Connect call failed ('127.0.0.1', 9200)])) Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x7ea5863ceef0> make: *** [Makefile:65: ftest] Error 1 `
Any chance you guys can pick up the process from here?
Sure I'll give it a look, should be a straightforward fix
@anghelnicolae I was able to fix the functional tests and pushed here: https://github.com/elastic/connectors/pull/2475
However, I see a merge conflict too - do you think you can fix the merge conflicts in this PR and I fix the functional tests further after?
@anghelnicolae I was able to fix the functional tests and pushed here: #2475
However, I see a merge conflict too - do you think you can fix the merge conflicts in this PR and I fix the functional tests further after?
I see 3 merge conflicts, I will try to do it tomorrow. Thanks for fixing the functional tests, can you tell me what I need to do to run the tests?
@anghelnicolae I will need your full run log to tell you what went wrong (ones you sent before did not include data from the start of the run).
My theory was that you did not have docker daemon launched so elasticsearch failed to start. Can you check that docker is running locally? If it's not the case, can you share the logs from the moment you started running the test?