collapseos icon indicating copy to clipboard operation
collapseos copied to clipboard

More robust filesystem functionality?

Open QuercusFelis opened this issue 4 years ago • 24 comments

I was wondering if there was interest in some more robust filesystem functionality, ie:

  • a single command to rename a file
  • automatic storage re-allocation when a file outgrows its allocated space
  • directories (is this already implemented?)
  • recursive delete on a directory
  • etc.

I would like to contribute some of these things but don't want to invest all of the time if they either already exist in a form I am unaware of - or simply aren't wanted in order to save space (in which case I might suggest them as an optional module to add to the glue.asm).

Additionally, if there is a feature I didn't list here that someone would like to see, please let me know and I'd be happy to work on it. If my interest is better directed towards other aspects of this project, point me in the correct direction and I once again would be more than happy to help. (I have experience with ARM assembly and have been reading up on z80 assembly so I'm fairly confident in my abilities, don't be shy).

QuercusFelis avatar Oct 14 '19 17:10 QuercusFelis

I'd be cool with it, but not my call.

I think supporting bigger file sizes would be great, even if it's just an optional module. It does seem to be breaking on files that aren't as big as what's documented to be max file size

keithstellyes avatar Oct 14 '19 18:10 keithstellyes

I agree, after playing around with the emulator I think basic file functionality would take this from a neat idea to something actually useful.

Nobody is going to write 10 lines of assembly to write a single word to a file. I haven't read anything about memory management either, there is a lot that is required to make this kernel actually useful for anything.

NoahGWood avatar Oct 15 '19 10:10 NoahGWood

Well, the OS will never be competitive so long as live arches and OS's exist, a major purpose is for when others aren't an option

keithstellyes avatar Oct 15 '19 13:10 keithstellyes

I'll soon address #25, which will make my answer here more complete, but here's a preliminary response.

Collapse OS' only goal is to program microcontrollers. Its technical needs are modest and, to fit the "self-contained" constraint, code simplicity is a very important aspect of the code, often above code efficiency.

A good model for what we could call a "good FS" is Alan Cox's (@EtchedPixels) inode implementation in FUZIX: https://github.com/EtchedPixels/FUZIX/blob/master/Kernel/inode.c . With this, we have pretty much all we need, well designed, compact, the whole shebang. All we need to do is to rewrite this to z80 asm.

However, this code is vastly more complex than CFS. Before we add that complexity, I'd like to think up actual use cases (within the bounds of the OS' main and only goal) where that would be needed.

hsoft avatar Oct 16 '19 02:10 hsoft

If you want a simple flat filesystem then just use the CP/M one - or just use CP/M. CP/M is entirely self hosting. For a heirarchical one the Fuzix one is based on UZI which is based upon the Unix v6 fs and it's about the simplest cleanest design on the planet except that it suffers horribly somewhat from fragmentation. However in today's world of SD cards and CF fragmentation has become basically irrelevant.

You don't need to rewrite the Fuzix one to Z80 btw - you could just extract the C stuff you need and compile it to asm with SDCC and then work on the asm. I'm still pondering doing a Fuzix 'extreme edition' that way to get Fuzix to run on a flat 64K memory Z80 system.

Or for that matter CP/M is self hosting, and can run the Hitech C compiler, or BDS C (complete asm source published by author I believe now), or QC (runs on CP/M and can self host) and so on 8)

If you want a really clean minimal implementation of a basic Unix like filesystem in C without the full stuff take a look at the late Dr Steve Hosgood's 6809 OMU codebase

https://github.com/EtchedPixels/OMU/tree/master/omu09/omu09/src

EtchedPixels avatar Oct 16 '19 12:10 EtchedPixels

Collapse OS' only goal is to program microcontrollers.

This is a big soundbite that seems like it should be in some of the official documentation/more visible, I think there's a lot of confusion around this point.

keithstellyes avatar Oct 16 '19 18:10 keithstellyes

@keithstellyes I agree and I was taken by surprise by viral sharing. I think that the website was always clear, but this point needs emphasis, which is the point behind #53

hsoft avatar Oct 16 '19 18:10 hsoft

I was too, I found out about the project from that Vice article that got spread around, and I doubt the author had much understanding of the project/tech in general, but that's typical tech journalism

keithstellyes avatar Oct 16 '19 19:10 keithstellyes

Just my 2 ¢, I would not call these proposals robust, but rather convenient. (They are not helping in making data loss less likely).

And before seeing file renames, I'd llke to see copying files (so that you can make a backup copy of a file), both within a filesystem and across 2 filesystems. Since it has more use cases and you can always copy and then delete if you really want to rename.

And I do not think that auto-growing is required, if there is a way to shrink a file to its actual size once you are finished editing (or compiling to) it. Then just guess generously, and shrink when done. If you later want to grow, copy to a larger file,

Maybe also add some option to compactify a filesystem (remove all the holes in between), then you do not even need to support fragmented files.

Having an enhanced directory lister that prints the actual and reserved sizes of files might also be useful.

All of this can be implemented in userspace, no need to modify the kernel - except there might be modifications needed to support 2 filesystems (2 SD cards) at the same time so you can actually copy files (that are larger than available RAM) between them.

schierlm avatar Oct 18 '19 20:10 schierlm

I'll take a look at all of these different file systems. I do think we need a strategy to prevent overwriting the start of one file with the end of another, which should be the kernel's job to prevent . While this adds more size to the kernel, the consequences of overwriting part of an important file could be fatal to the instance of Collapse you are running. Having the user just allocate space themselves is not the best idea because they will make mistakes and it will almost never use the most efficient possible allocation of space.

I will be the first to admit, I've never made or implemented a filesystem before; however, having had at least a quick skim over the links posted here, I don't see any reason why this shouldn't be fairly straightforward.

As for copy vs rename, I don't see any reason why we shouldn't have both. A rename function would be very small, as it only has to replace a handful of bytes in the file metadata. Replacing that functionality with a full copy would go well beyond simply inefficient. Using copy to rename a file would require a significant amount of free space to copy to and leave behind a gaping hole in the memory which will be just plain harder to make use of, or need to be closed up by later by another expensive function. Meanwhile, it's copying what may be 1000's of bytes where it is completely unnecessary strain on the system to do so. And obviously rename would be even more useless as a replacement for the copy function.

I think I'm going to lay out what I think should be handled by the kernel, and what features should be available to a user from a file management perspective. This is not an exhaustive list and some of these features already exist. User:

  • Create
  • Delete
  • Copy
  • Rename
  • Flag as read-only

Kernel:

  • Initial Allocation of Storage
  • Managing Metadata (meta-data should store file size, etc.)
  • Preventing Overwrites
  • Maintain minimal wasted space between files

To this end, having files marked as 'read-only' (i.e in metadata) would help the kernel to know not to allocate extra space at the end of a file in the event it grows.

All of this said, as this conversation continues and as I look more into what we have right now and what is already out there, we will come to a more final idea of what it is we want out of the file-system and tools given to the user for its management. Then we can write up a more complete specification and it can be implemented, scrutinized, revised, documented, and used.

QuercusFelis avatar Oct 18 '19 21:10 QuercusFelis

@schierlm I like your way to think. That's inline with Collapse OS' idea of a minimalism at the cost of convenience.

@QuercusFelis

I'll take a look at all of these different file systems. I do think we need a strategy to prevent overwriting the start of one file with the end of another

Yes, that is beyond convenience, I agree :) A PR along those lines would be most welcome.

As for the rest of your suggestions, they might very well be good, but the devil's always in the details: implementation might turn out more complicated than the feature's worth. I'd say that this discussion should be had over a PR so that we know what kind of tradeoffs we're talking about.

hsoft avatar Oct 31 '19 02:10 hsoft

The biggest issue with the file system is one broken link has a significant impact on the whole file system as in fscked. robustness is a how to survive that. Seems the emulator borks on a empty file system is there a tool in the system to init a null media? CP/M it would be era*.* as that clean the directory and sets the status of a file to deleted and the space it might have owned as available. Northstar dos has IN, for initialize disk, as in lay down tracks and put in the needed marks and empty directory. VMS (big 32bit monster) has utility to initialize the disk with a base file system. PC have format. That gets to a virgin FS but then you have to populate it. That is moee interesting thing to do on new bare metal.

Allison

Allisontheolder avatar Nov 12 '19 04:11 Allisontheolder

One of the things I learned with Fuzix - one challenge isn't the file system itself but being of a design that can be repaired in the memory you have available. That is not an insignificant challenge.

(era . btw isn't sufficient in CP/M in some cases because the era can fail: there was a tool everyone tended to have (INITDIR) that wrote 0xE5 over the first chunk of the dsk. There were also disk repair tools).

EtchedPixels avatar Nov 12 '19 13:11 EtchedPixels

CP/M if you bust the directory it might be repairable with a disk editor... its more of a salvage operation. The most common cause was floppies at power off can write trash where the head is.

Linked list files systems are harder to fix. Best you can do is guess and try. Its worse if there is no catalog to get the initial pointers from.

FAT/BAL file systems are as notorious as CP/M for failures due to busted FAT or the linking list.

Once case is the Northstar DOS, that's a tag and bag files system and the directory can be reconstructed if not totally lost. That requires direct user effort as often only one or two entries are lost so the blocks that describe can be inferred ownership. OF all the file systems that oen can be bullet proof by copying the directory to a second spot in the media.

What's tag and bag: Filename, starting bock, length on media, some detail bytes. NS* used 16 byte format where 8 are filename and addresses are two bytes. It major issue is if you delete a file you end up with a hole and compaction is the way to fix that. it also does not do scatter gather files. To grow a file you write a new temp and copy the old plus now data to it and delete on then rename the temp to old. And pray the disk has enough space for that. typical NS* use with a few files maybe had 40K of free space out of the 82k possible. The upside was it was efficient and took little code to use it.

Generally simple file systems are more susceptible as information to repair is thin.

If a crunch or other should happen media and disks are likely not going to last long as they have finite life so storage is the first long term worry. I'mm old enough have disks, tapes, and CDs that have developed bit rot and even media failure from just time. Happens when your an early adopter.

Eproms are also questionable and I've had 20 year old 2716s and 2764s loose bits but not fail as reprogramming them did work.

Allison

Allisontheolder avatar Nov 12 '19 15:11 Allisontheolder

I think this is definitely more than a 1 man job, and I think some of the more interested parties should set up an actual meeting to clearly define what we want to accomplish here and strategies to do so. The filesystem is a major cornerstone and we really need to be careful about what functionality we have just as much what functionality we choose not to add. I will leave it up to @hsoft to have the real final say on how to move forward with this.

QuercusFelis avatar Nov 12 '19 15:11 QuercusFelis

It's hard to place ourselves in a post-collapse future, but the way I see it, the most precious data will be kept on paper. Collapse OS being only a means to a single end that is programming microcontrollers, there isn't going to be that much data around: Collapse OS's own source code and a few code listings for microcontrollers that do real-world work.

Of course, reliability is something we want, but maybe not as badly as one could think at first, in this "computers-are-everywhere-and-do-everything-and-data-is-super-important" context.

hsoft avatar Nov 12 '19 16:11 hsoft

I have the small experince of trying ot be bleeding edge in the mid 70s when mos tof didn't even exist and the 8008 then 8080 were new kids.

Even then a rented system was often used to develop and then transport to the target. We do that all the time with Arduino and STM32F4. The real difference is tools are better and run on our personally owned systems.

So I see ability to of a system to compile on the target system as nice but can be a handicap. Especially so in controller space where less then a full system may be adequate for the task. Even less likely is a file system of any size. The development platform needs to be larger and often more capable. So there is a need for the big box and the small box as well. The small box is likely more impoftant as its stamping out bottle caps or pills.

Least that's the case from experience and pre PC era computing. Used to be having a PC meant you owned a computer and that means literally any computer. I was one of those that had something when it was literally rare. It also meant that another computer was not available to help.

Allisontheolder avatar Nov 13 '19 04:11 Allisontheolder

The central point of Collapse OS is self-reproduction on minimal machines. This doesn't fit a "big-box-small-box" model because we assume that we will lose the ability to build a big box again.

hsoft avatar Nov 13 '19 12:11 hsoft

Yes that's the goal but you missed bog box and small box can be the same and likely would be as its what you can build. Their difference is small box has only what it needs for say a control task, I: minimal ram and rom. The big box is logically the same but as much ram as can be obtained, maybe more rom and some form of mass storage.

Z80s for years with a 1-2K of ram a 2-4K eprom and whatever IO an maybe timer were and in a lot of cases still are widely used in embedded systems. Not likely to run much of an OS on that and if there and if so it needs to be a real time package. The host however can still be a Z80 but 32-64K ram enough rom to wake it up, and OS that can support a mass storage device and the needed programs to build new code for controllers.

If you say a collapse is possible; Alexa, Ring, Iphone/Android, and likely the Internet as we know it may all go away. Supporting that stuff is not an OS project.

One might look at what CPUs are likely wide spread and easiest to find as salvage or new old stock. That is assuming they can be had for a price.

Allison

Allisontheolder avatar Nov 13 '19 13:11 Allisontheolder

That's a broader discussion and I think I already address those points on the Collapse OS website (as to why I think z80 is a better choice than ARM or i386). I think I'll need to find a way to channel these broader discussions to a place other then github issues because these discussions, even if interesting, detract from the goal of a given issue.

Mailing list I guess... (relevant: #71)

hsoft avatar Nov 13 '19 13:11 hsoft

I'd argue the higher selling CPUs of any system level usefulness other than Z80 were 6502, 8049/8051, and likely 80186. And oddity was Chrysler use cd1802s by the pound for there emissions computers back when, there maybe ten million of those that some of likely exist.

A larger number of others but most of those are OTP rom based or so small in capability they are only found in controller devices. Also most devices that contain ARM and the more modern stuff will be SMT and likely BGA making prototyping hard and building multiple systems harder. Just being practical.

Then again I have a dozen complete and functional MicroVAX systems running Ultrix, Netbsd, and VMS that I've had few problems keeping going for the last 30 years, so 10 or 15 more is not an issue. My PDP-8f is a 1973 build and still running, likely I can keep that going for 20 more as its a simple SSI/MSI TTL machine. During the last hiccup (2007) I made extra money supporting machine shops that had PDP-8s and PDP11 controllers for their aging and still useful CNC machines. All three I have the timeshare/multiuser OSs for as well. That is a potential the plan I could sell out via modem processing time and storage as I could easily support 30 users or more if the phones still work.

PCs especially laptops on the other hand few make to 10 years old. Though I have a few 486 boxes as they made the cut and they have ISA bus that I have a pile of instrumentation boards for. With a few dfferent OSes, doesn't hurt to keep spares in layers.

Post collapse: The hardware has to be something that if salvaged managed to survive to that point. Not something from a drawer. If that were a criteria right now I have on hand a tube or so of all the popular CPUs and some here don't know of. That also means you need eproms or eeproms Rams, and peripherals all of which after a collapse will be hard to buy from Arrow of Farnell assuming they still exist in that speculative case. It also assumes all the scrap they may be on have not been ground up to recover the gold and valuables metals.

Allisontheolder avatar Nov 13 '19 14:11 Allisontheolder

Thats why we want to build our post-collapse machines with simple chips we have spare for, in metallic boxes... and not from smd components.

We need multiple different machines, each one with its specific abilities to be programmed and repaired, and plan to bootstrap our friend's machines on our own ones. My HC11 board will have a Z80 assembler, for example. Why dont you start writing some PDP-8 software to compile collapseos? I'm not even kidding. That would be a valuable use of your own resources towards a greater goal of computer survival.

f4grx avatar Nov 13 '19 14:11 f4grx

its an idea...

I'd not bother with the OS as I have several. But a shim t match up interfaces, and the assembler would be a good project for all three (PDP-8, PDP-11 and VAX).

Allisontheolder avatar Nov 13 '19 15:11 Allisontheolder

The biggest things is what you may finds is likely FAT based and the ability to read and recover information from that is not enhanced by a unique one of file system that is inefficient. FAT (incidentally not my favorite) has a lot of bulk but having used a very trimmed fat on Arduino (Atmega238p a Non-von 32K Flashrom, 2K ram and 1K EEprom) micro for embedded proejects Fat is doable with minimal hardware even using USB as the interface.

All those 100s of millions of thumb drives and most will still have stuff, hold stuff, and function and they are all FAT files. Also there is no rule the content must be FAT, save for recovery requires FAT.

Allisontheolder avatar Nov 16 '19 18:11 Allisontheolder