Mercurial > jhg
view design.txt @ 159:f5aed108754e wrap-data-access
Approach with DataAccess instead of plain byte[] was merged into default branch
author | Artem Tikhomirov <tikhomirov.artem@gmail.com> |
---|---|
date | Wed, 09 Mar 2011 13:28:02 +0100 |
parents | 26e3eeaa3962 |
children | 05829a70b30b |
line wrap: on
line source
FileStructureWalker (pass HgFile, HgFolder to callable; which can ask for VCS data from any file) External uses: user browses files, selects one and asks for its history Params: tip/revision; Implementation: manifest Log --rev Log <file> HgDataFile.history() or Changelog.history(file)? Changelog.all() to return list with placeholder, not-parsed elements (i.e. read only compressedLen field and skip to next record), so that total number of elements in the list is correct hg cat Implementation: logic to find file by name in the repository is the same with Log and other commands Revlog What happens when big entry is added to a file - when it detects it can't longer fit into .i and needs .d? Inline flag and .i format changes? What's hg natural way to see nodeids of specific files (i.e. when I do 'hg --debug manifest -r 11' and see nodeid of some file, and then would like to see what changeset this file came from)? ---------- + support patch from baseRev + few deltas (although done in a way patches are applied one by one instead of accumulated) + command-line samples (-R, filenames) (Log & Cat) to show on any repo +buildfile + run samples *input stream impl + lifecycle. Step forward with FileChannel and ByteBuffer, although questionable accomplishment (looks bit complicated, cumbersome) + dirstate.mtime +calculate sha1 digest for file to see I can deal with nodeid. +Do this correctly (smaller nodeid - first) *.hgignored processing +Nodeid to keep 20 bytes always, Revlog.Inspector to get nodeid array of meaningful data exact size (nor heading 00 bytes, nor 12 extra bytes from the spec) +DataAccess - implement memory mapped files, +Changeset to get index (local revision number) DataAccess - collect debug info (buffer misses, file size/total read operations) to find out better strategy to buffer size detection. Compare performance. delta merge RevisionWalker (on manifest) and WorkingCopyWalker (io.File) talking to ? and/or dirstate RevlogStream - Inflater. Perhaps, InflaterStream instead? Implement use of fncache (use names from it - perhaps, would help for Mac issues Alex mentioned) along with 'digest'-ing long file names Status operation from GUI - guess, usually on a file/subfolder, hence API should allow for starting path (unlike cmdline, seems useless to implement include/exclide patterns - GUI users hardly enter them, ever) ??? encodings of fncache, .hgignore, dirstate ??? http://mercurial.selenic.com/wiki/Manifest says "Multiple changesets may refer to the same manifest revision". To me, each changeset changes repository, hence manifest should update nodeids of the files it lists, effectively creating new manifest revision. >>>> Effective file read/data access ReadOperation, Revlog does: repo.getFileSystem().run(this.file, new ReadOperation(), long start=0, long end = -1) ReadOperation gets buffer (of whatever size, as decided by FS impl), parses it and then reports if needs more data. This helps to ensure streams are closed after reading, allows caching (if the same file (or LRU) is read few times in sequence) and allows buffer management (i.e. reuse. Single buffer for all reads). Scheduling multiple operations (in future, to deal with writes - single queue for FS operations - no locks?) File access: * NIO and mapped files - should be fast. Although seems to give less control on mem usage. * Regular InputStreams and chunked stream on top - allocate List<byte[]>, each (but last) chunk of fixed size (depending on initial file size) <<<<< Tests: DataAccess - readBytes(length > memBufferSize, length*2 > memBufferSize) - to check impl is capable to read huge chunks of data, regardless of own buffer size