Part-DB-server icon indicating copy to clipboard operation
Part-DB-server copied to clipboard

OEMSecrets provider interface v.1.0

Open pdo59 opened this issue 1 year ago • 3 comments

New class for interacting with the OEMSecrets (https://www.oemsecrets.com) API version 3.0.1.

Main Goal

The OEMSecrets APIs do not provide a unique identifier for each part, nor do they directly supply technical parameters associated with an electronic component, such as datasheets, images, or distributor information. This class aggregates and organizes the data received from the APIs by part_number and manufacturer_id to properly represent products and their related information.

Specific Features

  • Aggregation by part_number and manufacturer_id:
    Since the OEMSecrets APIs do not provide a unique ID for each part, the class creates a unique key by combining the part_number and manufacturer_id. This key (provider_id) is used to identify a product and aggregate all related information from various distributors.

  • Parameter Generation from Descriptions:
    The OEMSecrets APIs do not directly provide the technical parameters of components. To address this, the OEMSecretsProvider class extracts and generates technical parameters from the textual descriptions found in the API responses. This process allows for correctly populating data fields such as voltages, currents, operating temperatures, and more.

  • Datasheets Aggregation:
    Datasheets provided by individual distributors are collected and aggregated. Specifically, if multiple distributors provide the same datasheet, duplicates are removed. The parseDataSheets function ensures datasheets are handled correctly by ensuring the file name is unique, and it extracts the name from the URL if not explicitly provided.

  • Image Aggregation:
    Images provided by distributors are also collected and grouped according to the specific component. Duplicate images from distributors are filtered out.

  • Distributor Information Aggregation:
    For each component, information from different distributors, such as distributor name, available quantities, prices in various currencies, and purchase URLs, is aggregated. This information is kept separate for each distributor and updated as more data is processed.

  • Filtering Products Without Prices:
    The zero_price variable, configured in the .env.local file, controls whether products without valid prices should be discarded. If zero_price is set to 0, products with no valid price are excluded from the final result.

Technical Details

  • Session Storage:
    Arrays containing basic information, datasheets, images, parameters, and purchase information are stored in the session. Each array is indexed by a unique provider_id. When the getDetails function is called, the class reconstructs the data for the specific component by retrieving the arrays stored in the session.

  • Memory Optimization:
    Since the data from the APIs can be very large, the class includes memory optimization techniques, such as deallocating unnecessary data after each operation (gc_collect_cycles()).

  • Description Truncation and Notes Handling:
    If a description exceeds 100 characters, it is truncated and saved in the description field. The complete description is stored in the notes field, ensuring that all information is retained.

pdo59 avatar Aug 26 '24 18:08 pdo59

Codecov Report

Attention: Patch coverage is 4.08602% with 446 lines in your changes missing coverage. Please review.

Project coverage is 60.90%. Comparing base (756152d) to head (9ae7f32). Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
...nfoProviderSystem/Providers/OEMSecretsProvider.php 4.08% 446 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #679      +/-   ##
============================================
- Coverage     62.33%   60.90%   -1.43%     
- Complexity     5659     5820     +161     
============================================
  Files           518      519       +1     
  Lines         18708    19173     +465     
============================================
+ Hits          11662    11678      +16     
- Misses         7046     7495     +449     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 28 '24 11:08 codecov[bot]

Thanks you, seems good for most parts.

However, what I do not like is the usage of session here. Functional components like the info providers, should not be coupled to the session, and should ideally be stateless. The session is not necessarily always available. The Part-DB API authentication will be stateless in future (currently a session is still created, but that will change, as it causes some problems), and then your provider would throw issues, when the Part-DB API exposes the info provider functionality. Also it requires that the methods of the info provider are called in a specific order, which is not necessarily guaranteed.

I think the better approach is to define the provider_id in a way that you can easily extract the manufacturer and part_id from it, and then perform a search for the part_id at the API and then select the part mathching the manufacturer from the result. This way you get independent from the session and the provider becomes stateless. To save some requests you can cache the results from the overview search (with each part retrievable via its part_id) and reuse it in the detail step if an cached entry is existing.

Also you should look at the output of the phpstan inspections, there currently are some issues, which could be hints for bugs.

jbtronics avatar Aug 28 '24 11:08 jbtronics

ok, I'll review the code.

pdo59 avatar Aug 28 '24 12:08 pdo59

Also I am not really sure if the gc_collect_cycles actually have an effect on memory usage. Have you meansured this?

Because as far as I see it, the objects which are created get all passed back via the array references, meaning they are referenced and will not be garbage collected. And the DTO objects should not have circular references, so the garbage collection should take place automatically, when the reference counter approaches 0, without the need to collect cycles.

jbtronics avatar Aug 30 '24 11:08 jbtronics

Initially I actually used gc_collect_cycles for garbage cleaning, but with later changes it was probably not needed anymore, I must have left the statement there by mistake. Now I am following your previous advice by removing the use of session and will also remove gc_collect_cycles. I hope I do not have memory issues given the size that the json provided by the Oemsecrets API sometimes has.

pdo59 avatar Aug 30 '24 12:08 pdo59

Can i use APCu for cache the results from the overview search?

pdo59 avatar Aug 30 '24 15:08 pdo59

Can i use APCu for cache the results from the overview search?

APCu is not necessarily available on all Part-DB installations. You should implement caching using Symfony cache component (https://symfony.com/doc/current/components/cache.html), which support multiple different caching backends, and abstract the info provider from a specific caching provider.

It is not difficult to use: See the Ocotpart info provider for some examples (https://github.com/Part-DB/Part-DB-server/blob/master/src/Services/InfoProviderSystem/Providers/OctopartProvider.php)

jbtronics avatar Aug 30 '24 15:08 jbtronics

okay, thank you very much!

pdo59 avatar Aug 30 '24 15:08 pdo59

I saw the Octopart provider, everything is fine. Just one question: are there any size limits for the cache? The searchByKeyword result on oemsecrets might contain even a few hundred partDetailDTOs.

pdo59 avatar Aug 30 '24 15:08 pdo59

That should not be a problem (but you might wanna set an expiration date, so that these don't get stored forever).

And depending on how you wanna use the cached data later, It might be useful to put each returned part into its own cache item with a unique key, generateable from the part IDs, so that you can easily retrieve individual items.

jbtronics avatar Aug 30 '24 16:08 jbtronics

Thank you for the confirmation and the advice! I had actually already planned to implement both an expiration for the cache and unique keys based on part IDs for individual caching. It's great to know I'm on the right track. I'll continue with this approach, but if you have any further suggestions or concerns, I'm happy to adjust.

pdo59 avatar Aug 30 '24 16:08 pdo59

Observations on This Revision

  • I left the gc_collect_cycles() call in the processBatch function because, without this instruction, memory issues often occur in "dev" mode when trying to create a new part from the results list. In my experience, this garbage collection step seems to help in ensuring that memory is properly freed up between batches, which I think is crucial for preventing potential memory leaks or fragmentation that could otherwise lead to crashes or degraded performance during development. However, I have not observed any memory leaks in the production environment.
  • Additionally, I've noticed that when creating a new part from an information provider, the content in the "Attachments" tab sometimes expands beyond the browser window. This expansion effect also occurs during the editing of the component. It seems to happen when there's a very long datasheet URL, potentially when it wraps onto two lines.
  • Regarding the "Shopping Information" tab, in what order are distributors presented after the part is created? It would be useful if they were displayed in the same order as during the creation of the part, specifically sorted by country and/or region.
  • Finally, I observed that OEMSecrets differentiates prices for certain distributors, such as ARROW ELECTRONICS, based on lead time. To avoid duplicates and potential errors in part creation, I modified the order_number field in PurchaseInfoDTO by adding a suffix composed of "-" followed by the lead time in weeks and, if necessary, a sequential number. However, it would be more appropriate to add a dedicated property to the PurchaseInfoDTO object to store this information, which should also become part of the key in the database.

pdo59 avatar Sep 02 '24 20:09 pdo59

Thank you for your pull request, I merged it with some changes.

I optimized some smaller things and added to retrieve part details for a certain part id, even no search were performed before. Basically, it extracts the part name from the passed ID, and performs a search with this name, which should contain the desired result.

PHPstan was actually correct, these fields were never used (only a local variable with the same name) and are not required, so I removed them. Also, I made the resultData field local, as it were used in only this method.

I moved the gc_collect_cycles call, so that it do not get called in the loop (as in my profiling experiments caused 3% of total request time), but only once after all data was processed. I had no problems with this configuration and I see no real change in memory usage.

jbtronics avatar Sep 09 '24 12:09 jbtronics

Thank you for merging the pull request and for the detailed feedback! I really appreciate the optimizations you made, especially around the retrieval of part details based on the ID and the cleanup of unused fields. It makes the code more efficient and maintainable. Your profiling insights regarding gc_collect_cycles are also very helpful, and I agree that moving it outside of the loop makes a lot of sense based on your testing. I’ll keep these suggestions in mind for future contributions. Thanks again for your time and improvements!

pdo59 avatar Sep 09 '24 12:09 pdo59