laravel-elasticsearch icon indicating copy to clipboard operation
laravel-elasticsearch copied to clipboard

Refract/cleanup - Added Test suite / formating etc.

Open UseTheFork opened this issue 1 year ago • 5 comments

This commit introduces several new features and improvements to the package:

  • Nix Integration: Includes formatters, pre-commit hooks, and general environment setup.
  • Orchestra/Testbench: Integrates Testbench into the package.
  • Seeders and Factories Conversion: Converts all seeders and factories from laravel-elasticsearch-tests.
  • Base Testing Suite: Creates a base testing suite based on the site documentation.
  • Independent Tests: Modifies all tests to run 100% independently of each other (tests should never rely on each other).
  • GitHub Action for CI/CD: Adds a GitHub action to test for CI/CD (this will need further work but needs to be merged first).

There is definitely work to be done with coverage but in general this is a solid foundation: image

UseTheFork avatar Aug 03 '24 00:08 UseTheFork

@pdphilip , I just wanted to keep you in the loop on progress. I have the foundation all set, and all models and factories transferred and working.

I am working on what I see as the critical tests.

My only ask so far is if you can look at the PHP-CS-Fixer setup and let me know if you're good with it.

UseTheFork avatar Aug 03 '24 00:08 UseTheFork

@use-the-fork this is looking fantastic, very excited to have the tests built in. Amazing contribution

pdphilip avatar Aug 03 '24 10:08 pdphilip

Great work @use-the-fork - I'll review this over the next couple of days, looking great

pdphilip avatar Aug 05 '24 12:08 pdphilip

Need a bit more time @use-the-fork - excited about this one

pdphilip avatar Aug 12 '24 11:08 pdphilip

Need a bit more time @use-the-fork - excited about this one

No worries! I'm making some other changes in a separate branch and will raise PRs one by one.

UseTheFork avatar Aug 12 '24 11:08 UseTheFork

Need a bit more time @use-the-fork - excited about this one

Any update here? I have fixed a few more bugs and am keeping them in a separate branch until this PR is merged.

UseTheFork avatar Aug 26 '24 12:08 UseTheFork

Hey @use-the-fork - you can merge them in this same PR so long (and merge with latest main as well). Then I'll publish them all in one release rather.

There's a few things on the go my end as I'm building another package around this one. Hence shortage on time right now.

pdphilip avatar Aug 26 '24 14:08 pdphilip

@pdphilip All set!

UseTheFork avatar Aug 26 '24 22:08 UseTheFork

@gregory-lifhits-assaabloy, I see you snuck in a new feature. Please give me more information about what you're trying to do by:

  • What problem you're trying to solve
  • Linking documentation to ES docs
  • Example of how you envision this working with eloquent (code examples)

I saw your test but I'll need to document this ultimately. Thanks

pdphilip avatar Aug 28 '24 14:08 pdphilip

@gregory-lifhits-assaabloy, I see you snuck in a new feature. Please give me more information about what you're trying to do by:

  • What problem you're trying to solve
  • Linking documentation to ES docs
  • Example of how you envision this working with eloquent (code examples)

I saw your test but I'll need to document this ultimately. Thanks

Sorry, that's my work account 😄

Regardless you need this info:

What problem you're trying to solve Linking documentation to ES docs

The pagination works great but only up to 10,000 records. While yes it's an option to increase this count Via the elastic settings it's not recommended. Especially when an index may have millions of records.

The Old way to do this would be to use the deprecated _scroll API. The new way is to include sorts and use the search_after param See here: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after

The code additions give users the ability to scroll over all records without having to worry about the 10k limit. I implemented a custom cursor method that will pass the sort fields and allow the user to paginate via API or use it straight out in there code as seen in the tests.

Example of how you envision this working with eloquent (code examples)

So as an example, if a user wanted to paginate Via API they would do:

  $items = Item::where('business_unit', $brand->business_unit)
      ->orderBy('item_number')
      ->orderBy('created_at');

//Pass to a resource response from here
$items = $items->searchAfterPaginate(20)->withQueryString();

that will cause there to be a next_cursor and prev_cursor key that can be used to move back and forth.

Let me know if you need more explanation here.

UseTheFork avatar Aug 28 '24 19:08 UseTheFork

Ok, I'll take it from here to

  1. Get prev_cursor and prev_page_url to work properly
  2. Get $cols to filter in the items
  3. Find a reliable way to check for the last page

pdphilip avatar Aug 28 '24 20:08 pdphilip

Ok, I'll take it from here to

  1. Get prev_cursor and prev_page_url to work properly
  2. Get $cols to filter in the items
  3. Find a reliable way to check for the last page

So I have to check but I think a lot of this is handled with the implementation unless you are talking about adding documentation.

The larger concern is sorting REALLY matters and it can't be a general field with a value that may be the same across entries. For example, ordering by status will give inconsistent results. That should be highlighted in the docs. To make things safe I started saving the ID in a keyword field that I can then use to sort to guarantee not getting the same record more than once.

UseTheFork avatar Aug 28 '24 21:08 UseTheFork

Hey @use-the-fork

This is a great feature to add to the package

Your approach is great, but the implementation was incomplete.

These are the issues I found in testing it:

I made 100 products each with a queue field with a unique number (1-100)

Then I ordered desc on queue

  • The first page is good, and so are all the next cursors.
  • What didn't work is going backwards (prev). It didn't go back to the page before and it didn't sort it properly.
  • There was no way to get back to the beginning at all. If you go next 3 times - then you should only be able to go prev 3 times
  • I was doing 10 at a time. When I got to the end, it didn't recognise that it was the last page (because there were 10 results)
  • $cols were not filtering

Here's your commit prior to me reworking it: https://github.com/pdphilip/laravel-elasticsearch/pull/32/commits/94763ad004bd38e78ae03b8cff45d767cd689ad7

Refactor

I've gone with your direction, and gone further to take full control of cursor management, embedding a cursor state into the models.

  • This fixed next and previous paginating.
  • It knows when you're back in the beginning and removes the prev cursor and links
  • $column filtering works as expected
  • It knows when you're on the last page and disables next cursor and links

Added features.

  • If no orders are found, then it will look up the mapping to see if it can do created_at and updated_at - If not, then throws the MissingOrderException (Keep in mind, when testing this creating 100 records all have just about the same created_at so it's buggy)
  • If only one order is given, then it will try to add created_at/updated_at as the tie-breaker
  • Paginator also shows
    • 'current_page'
    • 'total' (totally number of records)
    • 'from' (showing records from)
    • 'to' (showing records to)
    • 'last_page'

Embedded in the cursor is a TTL, I've made it 5 minutes. If 5mins lapses then it will recount and calculate total and last_page

Finally, in the spirit of the package inheriting base methods, I renamed it to cursorPaginate

Have a look and give it a test.

Cheers

pdphilip avatar Aug 29 '24 14:08 pdphilip

@pdphilip I am working through this now. Looking good. One item I wanted to talk about how is the inferSort you added. I tried this before but it doesn't always work. If a user sets the created_at time manually or the insert happens within the same second the sorting can get messed up. I understand this is an edge case but still.

UseTheFork avatar Aug 29 '24 23:08 UseTheFork

This is still wip almost done. I really needed to get bulk insert working to test pagination without having to wait a long time.

UseTheFork avatar Aug 30 '24 01:08 UseTheFork

@pdphilip Take a look everything you did looks great. I just cleaned it up a bit and modified the tests to work.

I also refactored the insert method to work with the Bulk API. As a result, inserting 10,000 records is much faster!

I also added Model tests that check most of the base model functionality so that we are sure nothing broke.

UseTheFork avatar Aug 30 '24 16:08 UseTheFork

@pdphilip What are you using for formatting and styling? I want to ensure the git hooks match it so you don't have to do cleanup work.

UseTheFork avatar Aug 30 '24 20:08 UseTheFork

Pint, did add it to composer dev.

Tho I saw some weird formatting, but I think that was when I was switching branches and an outdated gitignore caused chaos.

Anyway. Pint good sir

pdphilip avatar Aug 30 '24 20:08 pdphilip

Pint, did add it to composer dev.

Tho I saw some weird formatting, but I think that was when I was switching branches and an outdated gitignore caused chaos.

Anyway. Pint good sir

Got it. I'm using PHP CS Fixer. I will switch that off and add Pint.

This is all looking really good I think. I get how the package works now So its getting easier to add features. I am also using tests from the MongoDB package and converting them to work with laravel-elasticsearch.

UseTheFork avatar Aug 30 '24 20:08 UseTheFork

Super. Let's park any new features for now. I really want to ship this and use it as the base going forward. Great work

pdphilip avatar Aug 31 '24 09:08 pdphilip

Super. Let's park any new features for now. I really want to ship this and use it as the base going forward. Great work

Agreed! I will make a new branch on my repo for anything else I am doing and merge in as separate PR after this goes in.

UseTheFork avatar Aug 31 '24 15:08 UseTheFork

Update:

I have made a few changes to the bulk method:

  1. I set chunk size to 1000 at a time (as done in PHP ES docs example), this feels safe for memory considerations

  2. The default is that it returns a summary array of what happened, ex:

{
"success": true,
"timed_out": false,
"took": 5082,
"total": 100000,
"created": 100000,
"modified": 0,
"failed": 0
}

(That's a real result by the way, amazing that it can create 100k records in 5sec)

If there are any errors, it will return the details in an error field:

{
"success": false,
"timed_out": false,
"took": 5055,
"total": 100000,
"created": 99619,
"modified": 0,
"failed": 381,
"error": [....], //<-errors in here
"errorMessage": "Bulk insert failed for some values"
}

It also differentiates between created/modified, ex:

$prods = Product::limit(100)->get();
return Product::insert($prods->toArray());
{
"success": true,
"timed_out": false,
"took": 7,
"total": 100,
"created": 0,
"modified": 100,
"failed": 0
}

insert has a second parameter if you want the [successful] data returned

$prods = Product::limit(100)->get();
$updatedProds = Product::insert($prods->toArray(),true);
return $updatedProds; //shows the data, not the summary

The reason that this is not the default is that at mass this will exhaust most memory settings

Another major upgrade is that get(), search() and insert() returns an ElasticCollection which has the metadata embedded in it. It works like a normal collection but you get access to the query meta

$prods = Product::limit(100)->get();
$updatedProds = Product::insert($prods->toArray(),true);
return $updatedProds->getQueryMetaAsArray();
$prods = Product::where('orders','>=',100)->get();
$prods->getDsl();
$prods->getQueryMetaAsArray();

Let's peg it here now, then finish up the last of the tests and get this shipped

pdphilip avatar Sep 02 '24 12:09 pdphilip

@pdphilip Yea, this all makes sense to me, and I thought about doing something similar. I think your approach works well. What other tests do we need before this can get merged in?

UseTheFork avatar Sep 02 '24 16:09 UseTheFork

I guess just the toDo()s - I've tested all the changes with my existing Test Suite covering 207 tests, though those don't include the cursor and bulk tests. Anyway, I'll wrap up the current toDo() ones, can you do a test for bulk?

Then we go.

pdphilip avatar Sep 02 '24 17:09 pdphilip

@pdphilip Is there a reason you removed:

        if ($refresh) {
            $params['refresh'] = $refresh;
        }

With out it it never waits for a refresh so it ends up failing tests since it's still building the records. I am going to add it back in since that way we can add insertWithOutRefresh in the future.

UseTheFork avatar Sep 02 '24 17:09 UseTheFork

Go for it

pdphilip avatar Sep 02 '24 17:09 pdphilip

Go for it

with that back in it's working perfect! I am going to add a test to make sure it's returning a ElasticCollection and after that I think you can ship.

UseTheFork avatar Sep 02 '24 17:09 UseTheFork

@pdphilip This is ready for you!

UseTheFork avatar Sep 02 '24 18:09 UseTheFork

@use-the-fork your CI is excellent, tested it and works great. Well done.

I'm going to leave the todo tests for now as is and prepare this release (update docs etc) - that's always a bit of admin. Then we ship this beast.

This is an amazing contribution, thanks a bunch. And I hope you'll stick around for more.

pdphilip avatar Sep 03 '24 11:09 pdphilip

Also, if you'd like to be in touch, shoot me an email (in composer)

pdphilip avatar Sep 03 '24 11:09 pdphilip