tikhomirov@1: FileStructureWalker (pass HgFile, HgFolder to callable; which can ask for VCS data from any file) tikhomirov@1: External uses: user browses files, selects one and asks for its history tikhomirov@1: Params: tip/revision; tikhomirov@1: Implementation: manifest tikhomirov@1: tikhomirov@1: Log --rev tikhomirov@1: Log tikhomirov@2: HgDataFile.history() or Changelog.history(file)? tikhomirov@2: tikhomirov@2: tikhomirov@2: Changelog.all() to return list with placeholder, not-parsed elements (i.e. read only compressedLen field and skip to next record), so that tikhomirov@2: total number of elements in the list is correct tikhomirov@1: tikhomirov@1: hg cat tikhomirov@2: Implementation: logic to find file by name in the repository is the same with Log and other commands tikhomirov@2: tikhomirov@2: tikhomirov@2: Revlog tikhomirov@4: What happens when big entry is added to a file - when it detects it can't longer fit into .i and needs .d? Inline flag and .i format changes? tikhomirov@4: tikhomirov@22: What's hg natural way to see nodeids of specific files (i.e. when I do 'hg --debug manifest -r 11' and see nodeid of some file, and tikhomirov@22: then would like to see what changeset this file came from)? tikhomirov@4: tikhomirov@4: ---------- tikhomirov@6: + support patch from baseRev + few deltas (although done in a way patches are applied one by one instead of accumulated) tikhomirov@4: + command-line samples (-R, filenames) (Log & Cat) to show on any repo tikhomirov@6: +buildfile + run samples tikhomirov@9: *input stream impl + lifecycle. Step forward with FileChannel and ByteBuffer, although questionable accomplishment (looks bit complicated, cumbersome) tikhomirov@14: + dirstate.mtime tikhomirov@43: +calculate sha1 digest for file to see I can deal with nodeid. +Do this correctly (smaller nodeid - first) tikhomirov@18: *.hgignored processing tikhomirov@25: +Nodeid to keep 20 bytes always, Revlog.Inspector to get nodeid array of meaningful data exact size (nor heading 00 bytes, nor 12 extra bytes from the spec) tikhomirov@26: +DataAccess - implement memory mapped files, tikhomirov@49: +Changeset to get index (local revision number) tikhomirov@60: +RevisionWalker (on manifest) and WorkingCopyWalker (io.File) talking to ? and/or dirstate (StatusCollector and WCSC) tikhomirov@60: +RevlogStream - Inflater. Perhaps, InflaterStream instead? branch:wrap-data-access tikhomirov@60: +repo.status - use same collector class twice, difference as external code. add external walker that keeps collected maps and use it in Log operation to give files+,files- tikhomirov@78: + strip \1\n metadata out from RevlogStream tikhomirov@84: + hash/digest long names for fncache tikhomirov@169: +Strip off metadata from beg of the stream - DataAccess (with rebase/moveBaseOffset(int)) would be handy tikhomirov@169: + hg status, compare revision and local file with kw expansion and eol extension tikhomirov@169: tikhomirov@169: write code to convert inlined revlog to .i and .d tikhomirov@60: tikhomirov@60: delta merge tikhomirov@60: DataAccess - collect debug info (buffer misses, file size/total read operations) to find out better strategy to buffer size detection. Compare performance. tikhomirov@396: RevlogStream - inflater buffer (and other buffers) size may be too small for repositories out there (i.e. inflater buffer of 512 bytes for 200k revision) tikhomirov@41: tikhomirov@169: tikhomirov@128: Parameterize StatusCollector to produce copy only when needed. And HgDataFile.metadata perhaps should be moved to cacheable place? tikhomirov@128: tikhomirov@18: Status operation from GUI - guess, usually on a file/subfolder, hence API should allow for starting path (unlike cmdline, seems useless to implement include/exclide patterns - GUI users hardly enter them, ever) tikhomirov@60: -> recently introduced FileWalker may perhaps help solving this (if starts walking from selected folder) for status op against WorkingDir? tikhomirov@9: tikhomirov@84: ? Can I use fncache (names from it - perhaps, would help for Mac issues Alex mentioned) tikhomirov@84: ? Does fncache lists both .i and .d (iow, is it true hashed .d is different from hashed .i) tikhomirov@84: tikhomirov@15: ??? encodings of fncache, .hgignore, dirstate tikhomirov@16: ??? http://mercurial.selenic.com/wiki/Manifest says "Multiple changesets may refer to the same manifest revision". To me, each changeset tikhomirov@16: changes repository, hence manifest should update nodeids of the files it lists, effectively creating new manifest revision. tikhomirov@15: tikhomirov@64: ? subrepos in log, status (-S) and manifest commands tikhomirov@64: tikhomirov@197: ? when p1 == -1, and p2 != -1, does HgStatusCollector.change() give correct result? tikhomirov@93: tikhomirov@64: Commands to get CommandContext where they may share various caches (e.g. StatusCollector) tikhomirov@93: Perhaps, abstract classes for all Inspectors (i.e. StatusCollector.Inspector) for users to use as base classes to protect from change? tikhomirov@64: tikhomirov@205: -cancellation and progress support tikhomirov@205: -timestamp check for revlog to recognize external changes tikhomirov@209: -HgDate or any other better access to time info tikhomirov@209: -(low) RepositoryComparator#calculateMissingBranches may query branches for the same head more than once tikhomirov@209: (when there are few heads that end up with common nodes). e.g hg4j revision 7 against remote hg4j revision 206 tikhomirov@205: tikhomirov@9: >>>> Effective file read/data access tikhomirov@9: ReadOperation, Revlog does: repo.getFileSystem().run(this.file, new ReadOperation(), long start=0, long end = -1) tikhomirov@9: ReadOperation gets buffer (of whatever size, as decided by FS impl), parses it and then reports if needs more data. tikhomirov@9: This helps to ensure streams are closed after reading, allows caching (if the same file (or LRU) is read few times in sequence) tikhomirov@9: and allows buffer management (i.e. reuse. Single buffer for all reads). tikhomirov@9: Scheduling multiple operations (in future, to deal with writes - single queue for FS operations - no locks?) tikhomirov@9: tikhomirov@60: WRITE: Need to register instances that cache files (e.g. dirstate or .hgignore) to FS notifier, so that cache may get cleared if the file changes (i.e. WriteOperation touches it). tikhomirov@60: tikhomirov@9: File access: tikhomirov@9: * NIO and mapped files - should be fast. Although seems to give less control on mem usage. tikhomirov@21: * Regular InputStreams and chunked stream on top - allocate List, each (but last) chunk of fixed size (depending on initial file size) tikhomirov@9: tikhomirov@129: tikhomirov@129: * API tikhomirov@129: + rename in .core Cset -> HgChangeset, tikhomirov@129: + rename in .repo Changeset to HgChangelog.Changeset, Changeset.Inspector -> HgChangelog.Inspector tikhomirov@129: - CommandContext tikhomirov@129: - Data access - not bytes, but ByteChannel tikhomirov@129: - HgRepository constants (TIP, BAD, WC) to HgRevisions enum tikhomirov@131: - RevisionMap to replace TreeMap tikhomirov@131: + .core.* rename to Hg* tikhomirov@131: + RepositoryTreeWalker to ManifestCommand to match other command classes tikhomirov@131: tikhomirov@131: * defects tikhomirov@136: + ConfigFile to strip comments from values (#) tikhomirov@129: tikhomirov@26: <<<<< tikhomirov@197: Performance. tikhomirov@197: after pooling/caching in HgStatusCollector and HgChangeset tikhomirov@197: hg log --debug -r 0:5000 and same via Log/HgLogCommand: approx. 220 seconds vs 279 seconds. Mem. cons. 20 vs 80 mb. tikhomirov@197: after further changes in HgStatusCollector (to read ahead 5 elements, 50 max cache, fixed bug with -1) - hg4j dumps 5000 in tikhomirov@197: 93 seconds, memory consumption about 50-56 Mb tikhomirov@197: tikhomirov@198: IndexEntry(int offset, int baseRevision) got replaced with int[] arrays (offsets - optional) tikhomirov@198: for 69338 revisions from cpython repo 1109408 bytes reduced to 277368 bytes with the new int[] version. tikhomirov@198: I.e. total for changelog+manifest is 1,5 Mb+ gain tikhomirov@198: tikhomirov@200: ParentWalker got arrays (Nodeid[] and int[]) instead of HashMap/LinkedHashSet. This change saves, per revision: tikhomirov@200: was: LinkedHashSet$Entry:32 + HashMap$Entry:24 + HashMap.entries[]:4 (in fact, up to 8, given entries size is power of 2, and 69000+ tikhomirov@200: elements in cpython test repo resulted in entries[131072]. tikhomirov@200: total: (2 HashMaps) 32+(24+4)*2 = 88 bytes tikhomirov@200: now: Nodeid[]:4 , int[]:4 bytes per entry. arrays of exact revlog size tikhomirov@200: total: (4 Nodeid[], 1 int[]) 4*4 + 4 = 20 bytes tikhomirov@200: for cpython test repo with 69338 revisions, 1 387 224 instead of 4 931 512 bytes. Mem usage (TaskManager) ~50 Mb when 10000 revs read tikhomirov@200: tikhomirov@197: <<<<< tikhomirov@26: tikhomirov@26: Tests: tikhomirov@61: DataAccess - readBytes(length > memBufferSize, length*2 > memBufferSize) - to check impl is capable to read huge chunks of data, regardless of own buffer size tikhomirov@61: tikhomirov@129: ExecHelper('cmd', OutputParser()).run(). StatusOutputParser, LogOutputParser extends OutputParser. construct java result similar to that of cmd, compare results tikhomirov@129: tikhomirov@202: Need better MethodRule than ErrorCollector for tests run as java app (to print not only MultipleFailureException, but distinct errors) tikhomirov@202: Also consider using ExternalResource and TemporaryFolder rules. tikhomirov@367: tikhomirov@367: tikhomirov@367: ================= tikhomirov@367: Naming: tikhomirov@367: nodeid: revision tikhomirov@367: int: revisionIndex (alternatives: revisionNumber, localRevisionNumber) tikhomirov@367: BUT, if class name bears Revision, may use 'index' and 'nodeid' tikhomirov@367: NOT nodeid because although fileNodeid and changesetNodeid are ok (less to my likening than fileRevision, however), it's not clear how tikhomirov@367: to name integer counterpart, just 'index' is unclear, need to denote nodeid and index are related. 'nodeidIndex' would be odd. tikhomirov@367: Unfortunately, Revision would be a nice name for a class . As long as I don't want to keep methods to access int/nodeid separately tikhomirov@367: and not to stick to Revision struct only (to avoid massive instances of Revision when only one is sufficient), I'll need to name tikhomirov@367: these separate methods anyway. Present opinion is that I don't need the object right now (will have to live with RevisionObject or RevisionDescriptor tikhomirov@427: once change my mind) tikhomirov@427: tikhomirov@427: Handlers (HgStatusHandler, HgManifestHandler, HgChangesetHandler, HgChangesetTreeHandler) tikhomirov@427: methods DO NOT throw CancelledException. cancellation is separate from processing logic. handlers can implements CancelSupport to become a source of cancellation, if necessary tikhomirov@427: methods DO throw HgCallbackTargetException to propagate own errors/exceptions tikhomirov@427: methods are supposed to silently pass HgRuntimeExceptions (although callback implementers may decide to wrap them into HgCallbackTargetException) tikhomirov@427: descriptive names for the methods, whenever possible (not bare #next)