netaddr icon indicating copy to clipboard operation
netaddr copied to clipboard

Refactor OUI and IAB database handling.

Open mpontillo opened this issue 8 years ago • 10 comments

  • Collapse test case duplicates.
  • Allow OUI class to optionally take in the index and registry file.
  • Add test cases to ensure strangely-encoded entries work.

mpontillo avatar Jan 09 '17 22:01 mpontillo

Hm, this one needs some work. Tests are failing on Python 2. (It might be that I wrote a legitimately failing test case, but I think I also missed a difference between Python 2 and Python 3.)

For some context, this change was motivated by my desire to ensure that the OUI database can handle strange non-UTF-8 artifacts in the IEEE data. Our software is seeing some tracebacks from netaddr, but after writing these tests, I'm no longer convinced there is a bug in netaddr. Rather, I think it has to do with a long-running application and the OUI index cache. If netaddr is updated while our application is running, the OUI index on disk would change, but not in memory.

Also, Debian/Ubuntu may have a separate issue; they chose to decouple the OUI database from netaddr itself, which means that there is a greater chance that the index will be out-of-sync with the IEEE source data file.

mpontillo avatar Jan 10 '17 02:01 mpontillo

Merge conflicts with this PR. Please can you update your fork and resubmit.

drkjam avatar Jan 22 '17 23:01 drkjam

I resolved the conflicts using GitHub's editor, but I didn't get a chance to test the branch again.

mpontillo avatar Jan 23 '17 05:01 mpontillo

I've been reviewing this pull request.

Some recent changes I made to parse the IEEE files and generate the indexes under both Python 2.x and 3.x correctly mean that merging this PR is no longer clean and introduces breaking changes to the codebase. For this reason, I can't accept it as is.

However, I certainly do want to provide the flexibility for users to introduce their own versions of the underlying data files so they can stay up-to-date on their own release cycles without necessarily being chain linked to netaddr's own releases just to gain access to new data files.

I did have an idea related to this.

Would it make life easier if I exposed the paths to the various IEEE data and index files as environment variables so their location could be more easily overridden? That way you could use netaddr as is and (assuming you've generated the indexes correctly from the underlying files) not have to write a bunch of additional code to try and override the files programatically.

I'm happy to support both a code override method via the constructors (as you have demonstrated in the PR) and environment variable overrides.

Let me know what you think and I'll have a go implementing the necessary changes.

drkjam avatar Jan 26 '17 08:01 drkjam

Thanks for the reply.

My apologies; I think my intentions with this weren't entirely clear.

When I started working on this pull request, my goal was to write a unit test that demonstrated a bug in netaddr, and then go fix it. However, the bug turned out to be not at all what I was expecting. So in the end, all this PR accomplished was to refactor the index handling a little, so that I could enhance the unit testing. (I don't think functionality like this is strictly needed for production use, but I thought it was probably a good step in the right direction.)

With regard to environment variables, I'm neutral on the idea. I think it would be nice to have the flexibility, but it would neither help nor hurt our particular use case.

So I feel like I should take a step back and talk about the potential requirements I think could be addressed to make things better for netaddr users. I went ahead and wrote down in a spreadsheet what I believe to be the user stories/requirements we have for this functionality. I've given you edit access, so feel free to use this document as you see fit, and/or comment as appropriate.

To me, the quick and easy fix is to do rows 8 and 9 (in the current spreadsheet); that is, the ability to regenerate the index on the first lookup, and the ability to keep the data file descriptor open. It would be nice to have some sort of integrity check, too, but in Debian's use case, I think it would be more reliable to just assume the index is out of date, since the data file is supplied by a completely different .deb package.

mpontillo avatar Jan 26 '17 09:01 mpontillo

I'll think about this, I have some ideas how to tackle that.

jstasiak avatar Jun 21 '20 12:06 jstasiak

@mpontillo Thank you for the contribution! I decided I won't merge it as a whole but I'm happy to pick at least one or two pieces from this PR in separate commits, you'll be appropriately credited of course. Please let me know if that's ok with you.

jstasiak avatar Mar 24 '21 21:03 jstasiak

No worries @jstasiak; please feel free to use whatever you'd like from this PR. I'm happy to discuss further, if needed - though unfortunately I don't work with code that uses netaddr on a day-to-day basis any more, so it's not top-of-mind.

mpontillo avatar Mar 24 '21 21:03 mpontillo

Thank you! I don't want your work to go to waste, so I'll try to get at least part of that merged. I'll ask questions if needed.

jstasiak avatar Mar 24 '21 21:03 jstasiak

PS. That's a great writeup you provided (https://docs.google.com/spreadsheets/d/1_yfJSQ2H-CwmkExyYPzms-e9qCm16ymraWCjn8MeFGQ/edit), it'll be useful.

jstasiak avatar Mar 24 '21 21:03 jstasiak