NST INDEX PROJECT



Thank you for you interest in participating.

I will not assume you know Norwegian, but of course most of you do. Knowing Norwegian is not necessary to help out. I can understand Norwegian but my writing still is "not so good". You can correspond with me in either English or Norwegian.

To give you a little background...

NSF has issued 2 "hefter" (booklets) per year for the last 70 years. 4 hefter then make up a "bind" (volume). So there have been about 35 binds produced.

Individual indexes have been published for the 35 volumes (but in some cases an index was created for 2 binds).

I have already OCR'ed and corrected volumes 1 through 10, and 34 and 35. I am looking for help with the remaining volumes.

I wrote a few programs to merge 2 volumes which went pretty well, and if it worked for 2, well.... with a bit more work, I can get the 35 volumes merged.

This is what I would ask you to help with:

To take my OCR'ed output and compare it to the "original". You will be provided with a plain "text" file of the unedited OCR output. You can edit it in a plain text editor like "notepad". The "original" can be an graphical image of the page, or I can send you a hardcopy.

The graphical image will be a TIFF format which can be read by most image views. The image viewer/editor Paint Shop Pro (shareware) is what I use. You can also try IrfanViewer an excellent graphics viewer which is free. I put the text editor and the image view side-by-side on my screen and do the proofreading that way. I must admit that if you prefer a hardcopy the text is clear but a little small.  You can find the download for Paint Shop Pro (32 bit and 16 bit versions are available) if you go back to my homepage and click on "Computers, Internet". You may want to try some of the text editors noted.

There is a sample TIFF file, unedited text file and edited text file at
http://home.nc.rr.com/risholmr/downloads

The most important things to note is that:
- If a persons information "wraps" to a second line, I edit it to be on one line. This way each line has a complete set of information for that one person.
- Punctuation (comma and period) are important.
- There should always be a "space" after a comma or a period, except for the period at the end of the line.
- If in doubt about a character, or you just can't read it .. just put a question mark (?) there and I will try to resolve it.
- Do not be concerned that some of the original characters are in BOLD or ITALICS - just type the character.
- Some of the indexes where in a format where individual names where not started on a new line, I have pre-done some work for you to put them on separate lines.
- Watch out for OCR errors which are harder to spot like using the letter "l" instead of the numeric 1 and vice-versa.


The Norwegian characters ÆØÅæøæ can be difficult if you are using the standard English keyboard. These characters can be found by running a program under Windows called "charmap".  You will also see some other characters like ü which you may need to find using charmap.

After reading this please get back to me with whether you can use the graphical type output, in which case I will put TIFF files in the same directory as the sample which you can then "download" as you did the sample. The unedited text will be sent to you via email. You then just need to send me back the edited text.

You efforts will be much appreciated.

Please get back to me with your decision on how you would like the "original", graphics file or hardcopy. If you can take the image files in a large "zipped" email (about 1 meg) I can send it to you that way, or I will put the individual "pages" (about 80k each in the same directory as the samples. 

Please send me address information such that I can send you your copy of the book when it is published.

Thank You

Robert Anders Risholm

risholmr@yahoo.com